- The paper presents a baseline federated learning system for wearable fitness tracking that reduces communication rounds by 13% with only a 1% drop in mean F1-score.
- It leverages the Flower framework and the lightweight TinyHAR model to efficiently process real-world inertial sensor data for human activity recognition.
- The study shows that implementing client-side early stopping can enhance local model personalization, evidenced by significant F1-score gains in several client cases.
FedFitTech: A Baseline in Federated Learning for Fitness Tracking
FedFitTech proposes and formalizes a baseline system for federated learning (FL) tailored to wearable fitness tracking (FitTech) applications. Built upon the Flower framework, the baseline specifically addresses the unique constraints and opportunities in decentralized human activity recognition (HAR) with inertial sensor data from wearables. The work contributes a reproducible open-source implementation and investigates the implications of client-side early stopping for communication efficiency and model personalization.
Context and Motivation
Wearable devices equipped with inertial sensors generate longitudinal datasets relevant for HAR and activity-level feedback within FitTech. However, traditional centralized machine learning approaches are increasingly infeasible due to:
- Stringent data privacy regulations (GDPR, CCPA).
- Scalability limitations as the device/user base grows.
- Communication inefficiencies and user reluctance to share raw sensor data.
Federated learning offers the alternative of decentralized model training, where only model updates—not raw data—are communicated, preserving data locality and privacy. While FL has found adoption in domains such as smartphones and text input, its systematic application in fitness tracking remains underexplored, especially regarding real-world sensor and usage heterogeneity, infrequent communication opportunities, and the balance between model generalization and user-level personalization.
Core Methodology
FedFitTech leverages the Flower framework to instantiate a standard FL pipeline suitable for resource-constrained devices with similar sensing and compute profiles (e.g., smartwatches):
- Model: TinyHAR, a lightweight neural architecture composed of convolutional layers, a transformer block, LSTM, and self-attention. It is designed to deliver competitive performance on inertial HAR with minimal compute and memory footprint.
- Dataset: The WEAR dataset, representing 22 participants (mapped to 24 clients for the experiment) with extensive annotation of fitness activities and null periods. Its outdoor, real-world context and diverse classes make it fit for validation.
- FL Aggregation: Standard FedAvg is used for server-side aggregation.
- Training Setup: Clients are assigned to users, each with local splits (80% train/20% test using time-series-aware partitioning), window size of 100, batch size 32, Adam optimizer with 0.001 learning rate, 1 local epoch per round, and 100 global rounds.
- Framework Features: Flower orchestrates communication, supports scalability, and facilitates large-scale emulation.
Client-Side Early Stopping: Case Study
The core experimental extension involves implementing client-side early stopping, motivated by the principle that once a user's local model ceases to improve (as measured by validation F1-score stability within a window of rounds), further participation in global rounds is redundant and energy-inefficient. The authors adopt a straightforward F1-score-based criterion (patience=5, threshold=0.01) for dropout.
Results
- Communication Reduction: Early stopping reduced the total communication rounds by 13% without sacrificing the global model's capacity for generalization.
- Recognition Performance: Mean F1-score decreased by only 1% (from 68% to 67%); notably, 11 out of 24 clients saw improved local F1-scores under early stopping.
- Class/Client Analysis: Certain activities and client cases saw notable F1-score increases due to better preservation of local patterns (e.g., client 2’s push-up recognition jumped from 0% to 76%).
- Participation Dynamics: 37.5% of clients dropped out early, reflecting realistic attrition in FL as models reach local convergence at different rates.
Technical and Practical Implications
The FedFitTech baseline demonstrates that:
- The FitTech domain—with its homogeneous sensors, moderate compute, and infrequent but meaningful communication opportunities—is particularly well-suited to FL.
- Simple personalization mechanisms such as client-side early stopping are effective at reducing communication and energy usage, with negligible or even positive effects on client-level accuracy in some cases.
- Using real-world, multimodal datasets with numerous classes and participants provides realistic benchmarking and uncovers activity-specific performance nuances.
Key practical recommendations derived from the study:
- Early Stopping Implementation: Integrate F1-score-stability-based early stopping in FL settings, especially for scenarios with non-stationary local data, to conserve resources.
- Dataset Splitting: When working with temporal HAR data, favor time-series-aware train-test partitioning to simulate realistic activity pattern drift.
- Model Choice: For deployment on wearables, prioritize architectures (such as TinyHAR) explicitly designed for low-resource operation while sustaining HAR accuracy.
- Baseline Availability: Open-sourcing FL baselines accelerates reproducibility and comparability in the research community.
Limitations and Future Directions
While FedFitTech leverages a representative dataset and practical FL setup, the present baseline focuses exclusively on inertial data and a single aggregation method. There is no exploration of:
- Differential privacy or secure aggregation mechanisms.
- Multimodal federated learning (e.g., fusion of egocentric video and sensor data).
- Transfer learning or cross-domain adaptation in FL for HAR.
- Advanced aggregation strategies beyond FedAvg (e.g., cluster-wise, personalized federated optimization).
Subsequent work should also extend large-scale, real-world deployments, examine robustness to participation heterogeneity, and explore strategies for improving recognition of rare or complex activities.
Theoretical and Broader AI Implications
The findings highlight several avenues of theoretical and practical importance:
- FL’s efficacy in balancing privacy with personalization in domains characterized by user-specific, temporally-evolving data streams.
- The emergence of early-stopping and resource-aware participation as first-order design choices in FL systems, relevant across domains with battery-constrained clients.
- The utility of realistic, diverse public datasets (like WEAR) as community benchmarks for FL in HAR.
The open-source nature of FedFitTech—bridging reproducible research and deployment—positions it as a reference point for future work in privacy-preserving, scalable FitTech AI systems. As hardware capabilities and privacy requirements continue to evolve, the methodologies established here will likely underpin broader applications at the intersection of wearable tech, decentralized learning, and personalized health.