EfficientFSL: Efficient Federated & Few-Shot Learning

Updated 20 January 2026

EfficientFSL is a framework comprising novel architectures, algorithms, and training strategies that enable resource-efficient learning in federated split and few-shot settings.
It achieves impressive reductions in communication and server storage, with empirical results showing up to 95% communication savings and 70% storage reduction.
The approach minimizes tunable parameters via query-only adaptation in Vision Transformers, ensuring rapid convergence and efficient edge deployment.

EfficientFSL encompasses a family of architectures, algorithms, and training strategies designed for resource-efficient learning in split federated and few-shot scenarios. Central themes include maximizing accuracy and convergence speed while minimizing computational, communication, and storage overhead required for distributed edge devices and large-scale models. This entry covers both federated split learning (FSL) variants aiming at communication/storage efficiency and the "EfficientFSL" framework for few-shot classification with Vision Transformers.

1. Motivation and Background

Federated learning (FL) and split learning (SL) are principal strategies for distributed training without sharing raw data. FL requires each client to store and update the full model, which can be prohibitive for deep architectures. Split learning offloads server-side computation by "cutting" the model: the client only processes the front portion and transmits intermediate ("smashed") activations to the server for finishing the forward and backward passes. However, standard FSL inherits high communication and storage costs due to frequent activation and gradient transmission, and often poor adaptability to heterogeneous devices.

EfficientFSL frameworks (e.g., CSE-FSL (Mu et al., 2023), FSL-SAGE (2505.23182)) address these limitations by introducing local loss computation with auxiliary networks, event-driven or sparse activation transmission, and global model aggregation to reduce both communication and server storage. Separately, EfficientFSL in few-shot learning (Liao et al., 13 Jan 2026) focuses on adapting large pre-trained models using minimal parameters, aiming for competitive accuracy on N-way K-shot tasks with limited computational resources.

2. Architectural Strategies

In federated split learning, the typical model architecture is decomposed as follows:

Component	Location	Role
Client-side Model ( $x_c$ )	Edge device	Extract local features; lightweight
Auxiliary Network ( $a_c$ )	Edge device	Calculates local surrogate loss; small size
Server-side Model ( $x_s$ )	Central server	Receives smashed activations; final updates

In "EfficientFSL" for ViTs, the architecture includes:

Forward Block: Synthesizes task-specific queries using trainable prompts and bottleneck projections over the frozen ViT backbone.
Combine Block: Fuses multi-layer outputs via shared alignment and conditional weighting.
Support-Query Attention Block: Adjusts class prototypes to mitigate domain shifts between support and query sets.

The modular insertion of trainable blocks allows query-only adaptation, isolating nearly all learning to a minimal set of new parameters ( $\sim$ 1–2M vs. tens of millions in full fine-tuning) (Liao et al., 13 Jan 2026).

3. Communication and Storage Efficiency

EfficientFSL strategies in federated split learning decrease bandwidth and server storage via multiple mechanisms:

Sparse Transmission: Clients upload smashed data only every $h$ batches, not per batch, with $h$ adjustable to trade off compute vs. communication cost (Mu et al., 2023, Mu et al., 21 Jul 2025).
Single-Server Model: The server maintains just one global server-side model and one auxiliary net for all clients, rather than $N$ copies (per-client) (Mu et al., 2023, Mu et al., 21 Jul 2025).

Empirical results demonstrate up to 95% communication reduction (CSE-FSL $_{10}$ : 9.5 GB vs. 172 GB) and 70% server storage reduction compared to vanilla FSL (Mu et al., 21 Jul 2025).

In the few-shot EfficientFSL (Liao et al., 13 Jan 2026):

Freezing the backbone means only the small adaptive blocks are tuned.
For ViT-S/16, trainable parameters drop from 21.7M (full FT) to 1.25M; peak GPU memory to 0.49 GB and per-epoch training to 23.6s.

4. Learning and Optimization Algorithms

Federated EfficientFSL frameworks implement:

Local Surrogate Loss: Each client employs a local surrogate via the auxiliary $a_c$ , updating client parameters without immediate server feedback. Formally,

$F_{c,i}(x_c, a_c) = \frac{1}{|D_i|}\sum_{z \in D_i} \ell(a_c \circ x_c(z), y)$

(Mu et al., 2023, Mu et al., 21 Jul 2025).

Periodic Aggregation: Global client model aggregates and updates server-side model after each epoch; server feedback is batched to reduce round-trip latency.
Auxiliary Model Estimation: In FSL-SAGE, auxiliary models are trained to imitate server-side backward gradients allowing asynchronous local updates; auxiliary models are aligned infrequently, amortizing communication (2505.23182).

EfficientFSL for ViTs introduces:

Query-Only Tuning: Only trainable modules synthesize, aggregate, and align features; the backbone remains frozen.
Support-Query Attention: Class prototypes are adaptively shifted towards query clusters:

$s^{att} = \alpha (s \cdot q'^T) q + (1-\alpha)s$

prediction uses cosine similarity to these adjusted prototypes (Liao et al., 13 Jan 2026).

5. Theoretical Guarantees

The convergence properties of EfficientFSL frameworks are formalized under standard FL assumptions: $L$ -smoothness, bounded variance, and (in CSE-FSL) mild distribution drift conditions.

Convergence Rate: Both CSE-FSL and FSL-SAGE converge to a first-order stationary point at rate $O(1/\sqrt T)$ for non-convex $\ell$ , matching FedAvg (Mu et al., 2023, 2505.23182, Mu et al., 21 Jul 2025).
Auxiliary Model Error: In FSL-SAGE, the misalignment between local auxiliary gradient estimation and global split gradient is provably controlled assuming PAC-learnability of the auxiliary function class, yielding the same $O(1/\sqrt T)$ rate (2505.23182).

A plausible implication is that split neural training on edge clients can attain communication/compute parity with classic FL, provided the auxiliary strategies are well tuned.

6. Empirical Performance and Applications

Quantitative experiments span CIFAR-10, FEMNIST, and cross-domain benchmarks:

Method	CIFAR-10 Acc (%)	Load (GB)	Storage (M params)
FSL_MC	80.55	172.46	5.34
FSL_AN	77.75	93.96	5.46
CSE-FSL (h=5)	76.52	18.14	1.61

On CIFAR-10 and FEMNIST, CSE-FSL with $h=5$ achieves 80% reduction in smashed-data communication and constant server storage across client count (Mu et al., 2023, Mu et al., 21 Jul 2025).
FSL-SAGE achieves top accuracy (85.7% on CIFAR-10 iid, 82.8% niid) with 2.2× less communication than CSE-FSL and substantially outperforms SplitFed and FedAvg under resource constraints (2505.23182).

In few-shot tasks using EfficientFSL (Liao et al., 13 Jan 2026), state-of-the-art results are attained:

Dataset	ViT-S Acc (1-shot/5-shot)	ViT-B Acc (1-shot/5-shot)
miniImageNet	97.40 / 99.05	98.34 / 99.12
tieredImageNet	89.72 / 95.41	93.27 / 96.78
CIFAR	88.82 / 94.60	93.25 / 97.28
FC100	69.94 / 81.68	80.13 / 88.81

Results hold across six cross-domain datasets, outperforming specialized meta-learners by 3–6% absolute (Liao et al., 13 Jan 2026).

7. Practical Considerations and Limitations

Key hyperparameters (e.g., upload interval $h$ , auxiliary model size) trade off memory and bandwidth against convergence and accuracy. Design and tuning remain task dependent, especially auxiliary architectures. Theoretical guarantees typically assume IID data and full participation, while practical deployments routinely contend with distributional drift and client heterogeneity.

In CSE-FSL, excessively large $h$ may slow learning for low-data clients, while asynchrony and event-triggered server updates do not degrade accuracy significantly (Mu et al., 21 Jul 2025).
Auxiliary estimation errors in FSL-SAGE are controlled theoretically but demand empirical validation under adversarial or highly non-IID splits (2505.23182).
EfficientFSL for few-shot classification does not alter privacy guarantees beyond those of vanilla split learning; further security considerations are not addressed (Mu et al., 2023, Liao et al., 13 Jan 2026).

EfficientFSL thus denotes a robust set of split learning frameworks capable of training large models over resource-constrained devices, supporting both federated and few-shot learning paradigms with strong theoretical and empirical efficiency.

Markdown Report Issue Upgrade to Chat

References (4)

Communication and Storage Efficient Federated Split Learning (2023)

FSL-SAGE: Accelerating Federated Split Learning via Smashed Activation Gradient Estimation (2025)

EfficientFSL: Enhancing Few-Shot Classification via Query-Only Tuning in Vision Transformers (2026)

Federated Split Learning with Improved Communication and Storage Efficiency (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to EfficientFSL.

EfficientFSL: Efficient Federated & Few-Shot Learning

1. Motivation and Background

2. Architectural Strategies

3. Communication and Storage Efficiency

4. Learning and Optimization Algorithms

5. Theoretical Guarantees

6. Empirical Performance and Applications

7. Practical Considerations and Limitations

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

EfficientFSL: Efficient Federated & Few-Shot Learning

1. Motivation and Background

2. Architectural Strategies

3. Communication and Storage Efficiency

4. Learning and Optimization Algorithms

5. Theoretical Guarantees

6. Empirical Performance and Applications

7. Practical Considerations and Limitations

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research