Robust Federated Learning with Confidence-Weighted Filtering and GAN-Based Completion under Noisy and Incomplete Data (2505.09733v1)

Published 14 May 2025 in cs.LG, cs.AI, and cs.CR

Abstract: Federated learning (FL) presents an effective solution for collaborative model training while maintaining data privacy across decentralized client datasets. However, data quality issues such as noisy labels, missing classes, and imbalanced distributions significantly challenge its effectiveness. This study proposes a federated learning methodology that systematically addresses data quality issues, including noise, class imbalance, and missing labels. The proposed approach systematically enhances data integrity through adaptive noise cleaning, collaborative conditional GAN-based synthetic data generation, and robust federated model training. Experimental evaluations conducted on benchmark datasets (MNIST and Fashion-MNIST) demonstrate significant improvements in federated model performance, particularly macro-F1 Score, under varying noise and class imbalance conditions. Additionally, the proposed framework carefully balances computational feasibility and substantial performance gains, ensuring practicality for resource constrained edge devices while rigorously maintaining data privacy. Our results indicate that this method effectively mitigates common data quality challenges, providing a robust, scalable, and privacy compliant solution suitable for diverse real-world federated learning scenarios.

Authors (2)

Alpaslan Gokcen (1 paper)
Ali Boyaci (2 papers)

Summary

Robust Federated Learning with Confidence-Weighted Filtering and GAN-Based Completion under Noisy and Incomplete Data

The paper focuses on addressing prevalent challenges in federated learning (FL), particularly the issues of noisy, imbalanced, and incomplete data that often arise in decentralized settings. The authors propose a robust federated learning framework that integrates three key components: a confidence-weighted filtering mechanism, a collaborative conditional GAN (cGAN) training process, and robust federated optimization strategies. This comprehensive approach aims to enhance model performance and maintain privacy across decentralized client datasets.

Federated learning allows multiple clients to collaboratively train a global model without sharing their local data, thus addressing privacy concerns. However, the effectiveness of FL systems can be severely hampered by data quality issues intrinsic to the real-world data collected by distributed clients. Such issues include label noise, missing class samples, and imbalanced data distributions, often leading to model degradation and poor generalization. Addressing these challenges requires amore fortified approach to preserve the integrity and utility of the aggregated model.

Methodology Overview

The proposed methodology is divided into three discrete stages, each targeting a specific aspect of data quality:

Local Noise Cleaning: This involves each client implementing a confidence-weighted filtering mechanism to identify and exclude mislabeled samples from the local dataset. The process leverages a combination of entropy-based, margin-based, and clustering-based confidence scores, along with adaptive thresholds, to maintain data quality.
Federated Conditional GAN Training: Clients collaboratively train lightweight cGANs using their refined datasets. This process follows a federated averaging protocol where only model parameters are shared for maintaining privacy, enabling synthetic data generation for missing classes.
Data Completion and Federated Training: Clients utilize the trained GAN models to generate synthetic samples to address data sparsity issues — specifically missing classes. These samples are then integrated into local datasets to balance class distributions, aiding improved convergence and generalization during the final global model training using either FedAvg or FedProx.

Experimental Evaluation

The framework's efficacy was validated using MNIST and Fashion-MNIST datasets under conditions of varying label noise and class imbalances. Results illustrated substantial enhancements in data quality metrics and classification performance relative to baseline federated learning models. The combined approach of confidence-based filtering and conditional GAN augmentation demonstrated significant improvements in macro-F1 scores, affirming the effectiveness of this tailored, comprehensive strategy.

Key Contributions and Implications

The paper presents several critical contributions to the federated learning landscape:

It introduces a multifaceted pipeline addressing pervasive data quality issues in federated settings, enhancing the robustness, scalability, and privacy compliance of FL systems.
The method leverages collaborative GANs to generate class-specific synthetic data, effectively mitigating the impact of missing classes and balancing data distributions.
The use of confidence-weighted filtering improves local dataset quality before federated aggregation, directly contributing to better model performance.

This work holds significant implications for the practical application of federated learning, particularly in environments where data privacy is paramount, such as healthcare and finance. Moreover, the robust framework sets a precedent for future developments, encouraging further research into integrating generative models and other advanced strategies to improve federated learning efficacy under real-world conditions.

Future Directions

Despite promising results, the framework faces limitations concerning computational resources, particularly for resource-constrained edge devices. The authors acknowledge the need for further research into model compression and efficient deployment strategies. Additionally, exploring more sophisticated privacy mechanisms could enhance compliance with stringent data protection regulations. The paper lays a foundation for future efforts focused on real-world FL applications, advocating for continuous refinement and innovation in handling data quality challenges.

Robust Federated Learning with Confidence-Weighted Filtering and GAN-Based Completion under Noisy and Incomplete Data (2505.09733v1)

Summary

Robust Federated Learning with Confidence-Weighted Filtering and GAN-Based Completion under Noisy and Incomplete Data

Methodology Overview

Experimental Evaluation

Key Contributions and Implications

Future Directions

Tweets

YouTube

Robust Federated Learning with Confidence-Weighted Filtering and GAN-Based Completion under Noisy and Incomplete Data (2505.09733v1)

Summary

Robust Federated Learning with Confidence-Weighted Filtering and GAN-Based Completion under Noisy and Incomplete Data

Methodology Overview

Experimental Evaluation

Key Contributions and Implications

Future Directions

Related Papers

Tweets

YouTube