RAF: Resolution-Adaptive Federated Learning

Updated 3 July 2026

Resolution-Adaptive Federated Learning (RAF) is a framework designed to adapt federated learning to heterogeneous data resolutions using tailored quantization and distillation techniques.
It integrates mixed-resolution quantization with dynamic power control in wireless networks and employs multi-resolution knowledge distillation to enhance accuracy in high-resolution regression tasks.
Empirical results show improvements like up to +5.4 PCK and over 96% communication overhead reduction, demonstrating RAF’s efficacy in non-IID and bandwidth-constrained environments.

Resolution-Adaptive Federated Learning (RAF) encompasses a family of federated learning frameworks specifically designed to address challenges posed by heterogeneity in data resolution, both in communication-constrained wireless networks and in heterogeneous client data modalities. Two primary instantiations have emerged: (1) mixed-resolution quantization and dynamic power control in cell-free massive MIMO (CFmMIMO) systems, and (2) multi-resolution knowledge distillation for high-resolution regression tasks under non-uniform client input resolutions. Both approaches are unified by their central focus on adaptation to resolution—whether in quantization granularity for communication efficiency and straggler mitigation, or in learned representations for robustness to spatial scale diversity—integrated seamlessly into standard FL workflows (Mahmoudi et al., 2024, Lim et al., 31 Jul 2025).

1. Resolution Drift and Non-IID Data in Federated Learning

Standard federated learning research typically addresses non-IID data distributions by considering label or feature skew across clients. However, a distinct form of heterogeneity—termed resolution-drift—emerges in scenarios where client data differ systematically in spatial resolution. In high-resolution regression tasks (e.g., keypoint detection), this form of heterogeneity cannot be addressed by mechanisms targeting class-level imbalance. Empirically, federated averaging (FedAvg) over clients with resolutions {128×96, 192×144, 256×192} results in a decrease in test accuracy at each resolution compared to homogeneous training; for instance, low-resolution Percentage of Correct Keypoints (PCK) drops from ~52.8 (homogeneous) to ~51.3–51.6 (heterogeneous), confirming that naive aggregation yields poorly generalized representations (Lim et al., 31 Jul 2025).

In wireless CFmMIMO FL deployments, a different but related resolution challenge arises from the need to efficiently communicate large, high-resolution gradient vectors under strict bandwidth and latency constraints. Here, resolution-adaptivity pertains to the quantization fidelity of gradient updates, with an explicit goal of trading accuracy for communication efficiency while avoiding straggler effects (Mahmoudi et al., 2024).

2. Mixed-Resolution Quantization and Power Allocation (Wireless FL)

RAF in CFmMIMO settings introduces an adaptive mixed-resolution quantization scheme for gradient updates:

Essential-Entry Selection: Each client applies a per-user threshold $\lambda_j \in (0,1)$ to classify gradient entries in $\Delta_t^j \in \mathbb{R}^d$ as "high-resolution" (if $|\left[\Delta_t^j\right]_i|/\|\Delta_t^j\|_\infty \geq \lambda_j$ ) or "low-resolution" otherwise.
Bit Allocation: High-resolution entries are quantized with $b_j \geq 2$ bits, while low-resolution entries use 1-bit sign quantization. The total bit budget is $b_t^j = d(b_j s_t^j + (1-s_t^j)) + 32$ , where $s_t^j$ is the fraction of high-resolution entries and the extra 32 bits encode the quantization radius.
Quantizer Construction: The quantizer $\mathcal{Q}(\cdot)$ produces a fixed value (half the grid radius) for low-magnitude entries and uniform quantization for high-magnitude entries (see precise cases in (Mahmoudi et al., 2024)).
Error and Convergence: The infinity-norm quantization error is upper bounded as $\|\Delta_t^j - \widehat{\Delta}_t^j\|_\infty \leq c_j \|\Delta_t^j\|_\infty$ , with $c_j$ a function of $\lambda_j$ and $\Delta_t^j \in \mathbb{R}^d$ 0. The algorithm enjoys $\Delta_t^j \in \mathbb{R}^d$ 1 convergence, with accuracy determined by quantization error.

Power control further optimizes per-user transmit powers $\Delta_t^j \in \mathbb{R}^d$ 2 to balance uplink rates and mitigate straggler latency. RAF maximizes the minimum “rate-per-bit” $\Delta_t^j \in \mathbb{R}^d$ 3 (where $\Delta_t^j \in \mathbb{R}^d$ 4 is the achievable uplink rate for user $\Delta_t^j \in \mathbb{R}^d$ 5), subject to linearized SINR constraints. This is solved efficiently through bisection over $\Delta_t^j \in \mathbb{R}^d$ 6 and LP feasibility checks at each iteration.

RAF’s joint quantization and power control achieves substantial reductions in communication overhead—at least 93% on datasets like CIFAR-10, CIFAR-100, and Fashion-MNIST. With proper $\Delta_t^j \in \mathbb{R}^d$ 7 and $\Delta_t^j \in \mathbb{R}^d$ 8 selection (e.g., $\Delta_t^j \in \mathbb{R}^d$ 9, $|\left[\Delta_t^j\right]_i|/\|\Delta_t^j\|_\infty \geq \lambda_j$ 0), test accuracy remains within 1% of uncompressed FL while overhead drops by at least 96%. Under latency constraints, the number of feasible global rounds increases compared to benchmarks, directly translating to greater learning within fixed budgets (Mahmoudi et al., 2024).

3. Multi-Resolution Knowledge Distillation for High-Resolution Regression

Resolution-Adaptive Federated Learning in the context of non-uniform input resolutions employs multi-resolution knowledge distillation (MRKD) as a local loss augmentation on each client:

Model Architecture: The backbone is ViTPose (ViT-S variant), made resolution-flexible by replacing fixed absolute positional encodings with convolutional embeddings: a global positional embedding (GPE) post-patch embedding and local positional embedding (LPE) inside each transformer block.
Multi-Resolution Processing: Each client constructs a pyramid of $|\left[\Delta_t^j\right]_i|/\|\Delta_t^j\|_\infty \geq \lambda_j$ 1 resolutions via downsampling (e.g., native image plus two lower resolutions). Each is fed through the shared model to produce heatmaps.
Loss Formulation: The primary task loss is MSE at native resolution. The distillation loss enforces consistency between high- and low-resolution outputs by upsampling the lower-resolution prediction and aligning it (only updating the student branch). The total objective is $|\left[\Delta_t^j\right]_i|/\|\Delta_t^j\|_\infty \geq \lambda_j$ 2 (with weightings $|\left[\Delta_t^j\right]_i|/\|\Delta_t^j\|_\infty \geq \lambda_j$ 3, $|\left[\Delta_t^j\right]_i|/\|\Delta_t^j\|_\infty \geq \lambda_j$ 4).
Algorithmic Workflow: RAF integrates into FedAvg/FedProx as a drop-in modification, with no additional communication or changes to server aggregation. Pseudocode involves local epochs of MRKD-augmented training, followed by server aggregation via standard means.

Theoretically, under realistic assumptions (bounded features and targets, unbiased stochastic gradients), this local objective is smooth and $|\left[\Delta_t^j\right]_i|/\|\Delta_t^j\|_\infty \geq \lambda_j$ 5-strongly convex with an explicit Lipschitz constant. FedAvg plus RAF achieves optimal $|\left[\Delta_t^j\right]_i|/\|\Delta_t^j\|_\infty \geq \lambda_j$ 6 convergence (in expectation), matching centralized rates (Lim et al., 31 Jul 2025).

4. Empirical Performance and Quantitative Findings

In wireless quantized RAF, experiments on IID and non-IID splits of CIFAR-10, CIFAR-100, and Fashion-MNIST demonstrate:

Overhead reduction of 75–96% compared to 32-bit float FL, with no significant reduction in test accuracy.
Under strict (e.g., 3 s) latency budgets, RAF enables up to 27 global rounds (vs. 17 for baselines like AQUILA, LAQ, Top- $|\left[\Delta_t^j\right]_i|/\|\Delta_t^j\|_\infty \geq \lambda_j$ 7), with 10–20% higher test accuracy in challenging non-IID settings (Mahmoudi et al., 2024).

For resolution-drift mitigation in keypoint detection:

On MPII Pose, three-client splits (128×96, 192×144, 256×192) show RAF yielding test PCK improvements up to +5.4 versus FedAvg, +4.9 versus FedProx at the lowest resolutions. Gains persist at all evaluated resolutions and generalize to held-out resolutions not seen at training.
Heatmap visualizations show that RAF produces sharper, higher-fidelity joint predictions, supported by t-SNE visualizations revealing well-separated clusters by resolution when MRKD is used.
Ablations comparing centralized training with/without MRKD confirm that knowledge distillation, not just increased data or parameter count, is the key factor in resolution robustness.

5. Insights, Design Guidelines, and Integration

RAF yields the following principled guidance:

Quantization Threshold ( $|\left[\Delta_t^j\right]_i|/\|\Delta_t^j\|_\infty \geq \lambda_j$ 8): Lower thresholds (e.g., $|\left[\Delta_t^j\right]_i|/\|\Delta_t^j\|_\infty \geq \lambda_j$ 9– $b_j \geq 2$ 0) induce more high-resolution entries, improving accuracy at the expense of longer per-round latency. Selection should account for both channel conditions and task error tolerance.
Bit Budget ( $b_j \geq 2$ 1): 6–12 bits per high-resolution entry generally suffice; additional bits have diminishing returns.
Power Control: The max-min “rate-per-bit” criterion directly mitigates straggler effects, offering improved global progress pacing over sum-rate or energy-centric alternatives.
Positional Encoding: Resolution-agnostic backbone adaptations (e.g., conv-based GPE/LPE) are essential for multi-scale tasks; fixed encodings create brittleness when input sizes vary.
Distillation Weighting ( $b_j \geq 2$ 2): $b_j \geq 2$ 3 balances task fit and resolution-consistency. The absence of a temperature parameter simplifies practical deployment (raw MSE suffices).
Orthogonality: The MRKD component is strictly local to the client; RAF can be integrated with existing FL server-side workflows (FedAvg, FedProx, Scaffold, etc.) without protocol modifications.

A practical RAF deployment for regression tasks comprises: updating positional encodings, multi-scale downsampling per client, MRKD-augmented loss, and FedAvg parameter aggregation.

6. Application Scope and Generalization

RAF generalizes to any setting where data resolution presents an axis of heterogeneity—either due to wireless communication constraints (gradients, parameter updates), or due to heterogeneous native data (high-resolution regression or representation learning tasks). The MRKD core is confirmed to benefit other domains, as t-SNE analyses reveal persistent resolution-sensitive clusters in segmentation and landmark localization. Conversely, standard classification models (e.g., ResNet-50 for ImageNet) do not show this sensitivity, underscoring the specificity of RAF’s value for spatially structured tasks.

In wireless FL, RAF’s design is underpinned by the compressibility of modern gradients and the necessity of straggler-aware optimization in cell-free massive MIMO. In federated vision, it leverages deep architectural innovations for scale-robust representation learning. Both lines of work provide a modular and theoretically rigorous solution to emerging bottlenecks in extreme-scale and heterogeneous federated deployments (Mahmoudi et al., 2024, Lim et al., 31 Jul 2025).

Markdown Report Issue Upgrade to Chat

References (2)

Adaptive Quantization Resolution and Power Control for Federated Learning over Cell-free Networks (2024)

Mitigating Resolution-Drift in Federated Learning: Case of Keypoint Detection (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Resolution-Adaptive Federated Learning (RAF).