FedFitTech: A Baseline in Federated Learning for Fitness Tracking

Published 20 Jun 2025 in cs.LG | (2506.16840v1)

Abstract: Rapid evolution of sensors and resource-efficient machine learning models have spurred the widespread adoption of wearable fitness tracking devices. Equipped with inertial sensors, such devices can continuously capture physical movements for fitness technology (FitTech), enabling applications from sports optimization to preventive healthcare. Traditional centralized learning approaches to detect fitness activities struggle with privacy concerns, regulatory constraints, and communication inefficiencies. In contrast, Federated Learning (FL) enables a decentralized model training by communicating model updates rather than private wearable sensor data. Applying FL to FitTech presents unique challenges, such as data imbalance, lack of labelled data, heterogeneous user activity patterns, and trade-offs between personalization and generalization. To simplify research on FitTech in FL, we present the FedFitTech baseline, under the Flower framework, which is publicly available and widely used by both industry and academic researchers. Additionally, to illustrate its usage, this paper presents a case study that implements a system based on the FedFitTech baseline, incorporating a client-side early stopping strategy and comparing the results. For instance, this system allows wearable devices to optimize the trade-off between capturing common fitness activity patterns and preserving individuals' nuances, thereby enhancing both the scalability and efficiency of privacy-aware fitness tracking applications. Results show that this reduces overall redundant communications by 13 percent, while maintaining the overall recognition performance at a negligible recognition cost by 1 percent. Thus, FedFitTech baseline creates a foundation for a wide range of new research and development opportunities in FitTech, and it is available as open-source at: https://github.com/adap/flower/tree/main/baselines/fedfittech

Abstract PDF Upgrade to Chat

Authors (4)

Summary

The paper presents a baseline federated learning system for wearable fitness tracking that reduces communication rounds by 13% with only a 1% drop in mean F1-score.
It leverages the Flower framework and the lightweight TinyHAR model to efficiently process real-world inertial sensor data for human activity recognition.
The study shows that implementing client-side early stopping can enhance local model personalization, evidenced by significant F1-score gains in several client cases.

FedFitTech: A Baseline in Federated Learning for Fitness Tracking

FedFitTech proposes and formalizes a baseline system for federated learning (FL) tailored to wearable fitness tracking (FitTech) applications. Built upon the Flower framework, the baseline specifically addresses the unique constraints and opportunities in decentralized human activity recognition (HAR) with inertial sensor data from wearables. The work contributes a reproducible open-source implementation and investigates the implications of client-side early stopping for communication efficiency and model personalization.

Context and Motivation

Wearable devices equipped with inertial sensors generate longitudinal datasets relevant for HAR and activity-level feedback within FitTech. However, traditional centralized machine learning approaches are increasingly infeasible due to:

Stringent data privacy regulations (GDPR, CCPA).
Scalability limitations as the device/user base grows.
Communication inefficiencies and user reluctance to share raw sensor data.

Federated learning offers the alternative of decentralized model training, where only model updates—not raw data—are communicated, preserving data locality and privacy. While FL has found adoption in domains such as smartphones and text input, its systematic application in fitness tracking remains underexplored, especially regarding real-world sensor and usage heterogeneity, infrequent communication opportunities, and the balance between model generalization and user-level personalization.

Core Methodology

FedFitTech leverages the Flower framework to instantiate a standard FL pipeline suitable for resource-constrained devices with similar sensing and compute profiles (e.g., smartwatches):

Model: TinyHAR, a lightweight neural architecture composed of convolutional layers, a transformer block, LSTM, and self-attention. It is designed to deliver competitive performance on inertial HAR with minimal compute and memory footprint.
Dataset: The WEAR dataset, representing 22 participants (mapped to 24 clients for the experiment) with extensive annotation of fitness activities and null periods. Its outdoor, real-world context and diverse classes make it fit for validation.
FL Aggregation: Standard FedAvg is used for server-side aggregation.
Training Setup: Clients are assigned to users, each with local splits (80% train/20% test using time-series-aware partitioning), window size of 100, batch size 32, Adam optimizer with 0.001 learning rate, 1 local epoch per round, and 100 global rounds.
Framework Features: Flower orchestrates communication, supports scalability, and facilitates large-scale emulation.

Client-Side Early Stopping: Case Study

The core experimental extension involves implementing client-side early stopping, motivated by the principle that once a user's local model ceases to improve (as measured by validation F1-score stability within a window of rounds), further participation in global rounds is redundant and energy-inefficient. The authors adopt a straightforward F1-score-based criterion (patience=5, threshold=0.01) for dropout.

Results

Communication Reduction: Early stopping reduced the total communication rounds by 13% without sacrificing the global model's capacity for generalization.
Recognition Performance: Mean F1-score decreased by only 1% (from 68% to 67%); notably, 11 out of 24 clients saw improved local F1-scores under early stopping.
Class/Client Analysis: Certain activities and client cases saw notable F1-score increases due to better preservation of local patterns (e.g., client 2’s push-up recognition jumped from 0% to 76%).
Participation Dynamics: 37.5% of clients dropped out early, reflecting realistic attrition in FL as models reach local convergence at different rates.

Technical and Practical Implications

The FedFitTech baseline demonstrates that:

The FitTech domain—with its homogeneous sensors, moderate compute, and infrequent but meaningful communication opportunities—is particularly well-suited to FL.
Simple personalization mechanisms such as client-side early stopping are effective at reducing communication and energy usage, with negligible or even positive effects on client-level accuracy in some cases.
Using real-world, multimodal datasets with numerous classes and participants provides realistic benchmarking and uncovers activity-specific performance nuances.

Key practical recommendations derived from the study:

Early Stopping Implementation: Integrate F1-score-stability-based early stopping in FL settings, especially for scenarios with non-stationary local data, to conserve resources.
Dataset Splitting: When working with temporal HAR data, favor time-series-aware train-test partitioning to simulate realistic activity pattern drift.
Model Choice: For deployment on wearables, prioritize architectures (such as TinyHAR) explicitly designed for low-resource operation while sustaining HAR accuracy.
Baseline Availability: Open-sourcing FL baselines accelerates reproducibility and comparability in the research community.

Limitations and Future Directions

While FedFitTech leverages a representative dataset and practical FL setup, the present baseline focuses exclusively on inertial data and a single aggregation method. There is no exploration of:

Differential privacy or secure aggregation mechanisms.
Multimodal federated learning (e.g., fusion of egocentric video and sensor data).
Transfer learning or cross-domain adaptation in FL for HAR.
Advanced aggregation strategies beyond FedAvg (e.g., cluster-wise, personalized federated optimization).

Subsequent work should also extend large-scale, real-world deployments, examine robustness to participation heterogeneity, and explore strategies for improving recognition of rare or complex activities.

Theoretical and Broader AI Implications

The findings highlight several avenues of theoretical and practical importance:

FL’s efficacy in balancing privacy with personalization in domains characterized by user-specific, temporally-evolving data streams.
The emergence of early-stopping and resource-aware participation as first-order design choices in FL systems, relevant across domains with battery-constrained clients.
The utility of realistic, diverse public datasets (like WEAR) as community benchmarks for FL in HAR.

The open-source nature of FedFitTech—bridging reproducible research and deployment—positions it as a reference point for future work in privacy-preserving, scalable FitTech AI systems. As hardware capabilities and privacy requirements continue to evolve, the methodologies established here will likely underpin broader applications at the intersection of wearable tech, decentralized learning, and personalized health.

Markdown Report Issue