EfficientFSL: Efficient Federated & Few-Shot Learning
- EfficientFSL is a framework comprising novel architectures, algorithms, and training strategies that enable resource-efficient learning in federated split and few-shot settings.
- It achieves impressive reductions in communication and server storage, with empirical results showing up to 95% communication savings and 70% storage reduction.
- The approach minimizes tunable parameters via query-only adaptation in Vision Transformers, ensuring rapid convergence and efficient edge deployment.
EfficientFSL encompasses a family of architectures, algorithms, and training strategies designed for resource-efficient learning in split federated and few-shot scenarios. Central themes include maximizing accuracy and convergence speed while minimizing computational, communication, and storage overhead required for distributed edge devices and large-scale models. This entry covers both federated split learning (FSL) variants aiming at communication/storage efficiency and the "EfficientFSL" framework for few-shot classification with Vision Transformers.
1. Motivation and Background
Federated learning (FL) and split learning (SL) are principal strategies for distributed training without sharing raw data. FL requires each client to store and update the full model, which can be prohibitive for deep architectures. Split learning offloads server-side computation by "cutting" the model: the client only processes the front portion and transmits intermediate ("smashed") activations to the server for finishing the forward and backward passes. However, standard FSL inherits high communication and storage costs due to frequent activation and gradient transmission, and often poor adaptability to heterogeneous devices.
EfficientFSL frameworks (e.g., CSE-FSL (Mu et al., 2023), FSL-SAGE (2505.23182)) address these limitations by introducing local loss computation with auxiliary networks, event-driven or sparse activation transmission, and global model aggregation to reduce both communication and server storage. Separately, EfficientFSL in few-shot learning (Liao et al., 13 Jan 2026) focuses on adapting large pre-trained models using minimal parameters, aiming for competitive accuracy on N-way K-shot tasks with limited computational resources.
2. Architectural Strategies
In federated split learning, the typical model architecture is decomposed as follows:
| Component | Location | Role |
|---|---|---|
| Client-side Model () | Edge device | Extract local features; lightweight |
| Auxiliary Network () | Edge device | Calculates local surrogate loss; small size |
| Server-side Model () | Central server | Receives smashed activations; final updates |
In "EfficientFSL" for ViTs, the architecture includes:
- Forward Block: Synthesizes task-specific queries using trainable prompts and bottleneck projections over the frozen ViT backbone.
- Combine Block: Fuses multi-layer outputs via shared alignment and conditional weighting.
- Support-Query Attention Block: Adjusts class prototypes to mitigate domain shifts between support and query sets.
The modular insertion of trainable blocks allows query-only adaptation, isolating nearly all learning to a minimal set of new parameters (1–2M vs. tens of millions in full fine-tuning) (Liao et al., 13 Jan 2026).
3. Communication and Storage Efficiency
EfficientFSL strategies in federated split learning decrease bandwidth and server storage via multiple mechanisms:
- Sparse Transmission: Clients upload smashed data only every batches, not per batch, with adjustable to trade off compute vs. communication cost (Mu et al., 2023, Mu et al., 21 Jul 2025).
- Single-Server Model: The server maintains just one global server-side model and one auxiliary net for all clients, rather than copies (per-client) (Mu et al., 2023, Mu et al., 21 Jul 2025).
Empirical results demonstrate up to 95% communication reduction (CSE-FSL: 9.5 GB vs. 172 GB) and 70% server storage reduction compared to vanilla FSL (Mu et al., 21 Jul 2025).
In the few-shot EfficientFSL (Liao et al., 13 Jan 2026):
- Freezing the backbone means only the small adaptive blocks are tuned.
- For ViT-S/16, trainable parameters drop from 21.7M (full FT) to 1.25M; peak GPU memory to 0.49 GB and per-epoch training to 23.6s.
4. Learning and Optimization Algorithms
Federated EfficientFSL frameworks implement:
- Local Surrogate Loss: Each client employs a local surrogate via the auxiliary , updating client parameters without immediate server feedback. Formally,
(Mu et al., 2023, Mu et al., 21 Jul 2025).
- Periodic Aggregation: Global client model aggregates and updates server-side model after each epoch; server feedback is batched to reduce round-trip latency.
- Auxiliary Model Estimation: In FSL-SAGE, auxiliary models are trained to imitate server-side backward gradients allowing asynchronous local updates; auxiliary models are aligned infrequently, amortizing communication (2505.23182).
EfficientFSL for ViTs introduces:
- Query-Only Tuning: Only trainable modules synthesize, aggregate, and align features; the backbone remains frozen.
- Support-Query Attention: Class prototypes are adaptively shifted towards query clusters:
prediction uses cosine similarity to these adjusted prototypes (Liao et al., 13 Jan 2026).
5. Theoretical Guarantees
The convergence properties of EfficientFSL frameworks are formalized under standard FL assumptions: -smoothness, bounded variance, and (in CSE-FSL) mild distribution drift conditions.
- Convergence Rate: Both CSE-FSL and FSL-SAGE converge to a first-order stationary point at rate for non-convex , matching FedAvg (Mu et al., 2023, 2505.23182, Mu et al., 21 Jul 2025).
- Auxiliary Model Error: In FSL-SAGE, the misalignment between local auxiliary gradient estimation and global split gradient is provably controlled assuming PAC-learnability of the auxiliary function class, yielding the same rate (2505.23182).
A plausible implication is that split neural training on edge clients can attain communication/compute parity with classic FL, provided the auxiliary strategies are well tuned.
6. Empirical Performance and Applications
Quantitative experiments span CIFAR-10, FEMNIST, and cross-domain benchmarks:
| Method | CIFAR-10 Acc (%) | Load (GB) | Storage (M params) |
|---|---|---|---|
| FSL_MC | 80.55 | 172.46 | 5.34 |
| FSL_AN | 77.75 | 93.96 | 5.46 |
| CSE-FSL (h=5) | 76.52 | 18.14 | 1.61 |
- On CIFAR-10 and FEMNIST, CSE-FSL with achieves 80% reduction in smashed-data communication and constant server storage across client count (Mu et al., 2023, Mu et al., 21 Jul 2025).
- FSL-SAGE achieves top accuracy (85.7% on CIFAR-10 iid, 82.8% niid) with 2.2× less communication than CSE-FSL and substantially outperforms SplitFed and FedAvg under resource constraints (2505.23182).
In few-shot tasks using EfficientFSL (Liao et al., 13 Jan 2026), state-of-the-art results are attained:
| Dataset | ViT-S Acc (1-shot/5-shot) | ViT-B Acc (1-shot/5-shot) |
|---|---|---|
| miniImageNet | 97.40 / 99.05 | 98.34 / 99.12 |
| tieredImageNet | 89.72 / 95.41 | 93.27 / 96.78 |
| CIFAR | 88.82 / 94.60 | 93.25 / 97.28 |
| FC100 | 69.94 / 81.68 | 80.13 / 88.81 |
Results hold across six cross-domain datasets, outperforming specialized meta-learners by 3–6% absolute (Liao et al., 13 Jan 2026).
7. Practical Considerations and Limitations
Key hyperparameters (e.g., upload interval , auxiliary model size) trade off memory and bandwidth against convergence and accuracy. Design and tuning remain task dependent, especially auxiliary architectures. Theoretical guarantees typically assume IID data and full participation, while practical deployments routinely contend with distributional drift and client heterogeneity.
- In CSE-FSL, excessively large may slow learning for low-data clients, while asynchrony and event-triggered server updates do not degrade accuracy significantly (Mu et al., 21 Jul 2025).
- Auxiliary estimation errors in FSL-SAGE are controlled theoretically but demand empirical validation under adversarial or highly non-IID splits (2505.23182).
- EfficientFSL for few-shot classification does not alter privacy guarantees beyond those of vanilla split learning; further security considerations are not addressed (Mu et al., 2023, Liao et al., 13 Jan 2026).
EfficientFSL thus denotes a robust set of split learning frameworks capable of training large models over resource-constrained devices, supporting both federated and few-shot learning paradigms with strong theoretical and empirical efficiency.