Multi-site fMRI Analysis Using Privacy-preserving Federated Learning and Domain Adaptation: ABIDE Results (2001.05647v3)

Published 16 Jan 2020 in cs.LG and eess.IV

Abstract: Deep learning models have shown their advantage in many different tasks, including neuroimage analysis. However, to effectively train a high-quality deep learning model, the aggregation of a significant amount of patient information is required. The time and cost for acquisition and annotation in assembling, for example, large fMRI datasets make it difficult to acquire large numbers at a single site. However, due to the need to protect the privacy of patient data, it is hard to assemble a central database from multiple institutions. Federated learning allows for population-level models to be trained without centralizing entities' data by transmitting the global model to local entities, training the model locally, and then averaging the gradients or weights in the global model. However, some studies suggest that private information can be recovered from the model gradients or weights. In this work, we address the problem of multi-site fMRI classification with a privacy-preserving strategy. To solve the problem, we propose a federated learning approach, where a decentralized iterative optimization algorithm is implemented and shared local model weights are altered by a randomization mechanism. Considering the systemic differences of fMRI distributions from different sites, we further propose two domain adaptation methods in this federated learning formulation. We investigate various practical aspects of federated model optimization and compare federated learning with alternative training strategies. Overall, our results demonstrate that it is promising to utilize multi-site data without data sharing to boost neuroimage analysis performance and find reliable disease-related biomarkers. Our proposed pipeline can be generalized to other privacy-sensitive medical data analysis problems.

Authors (6)

Xiaoxiao Li (144 papers)
Yufeng Gu (4 papers)
Nicha Dvornek (8 papers)
Lawrence Staib (13 papers)
Pamela Ventola (17 papers)
James S. Duncan (67 papers)

Citations (315)

View on Semantic Scholar

Summary

Overview of Multi-site fMRI Analysis Using Privacy-preserving Federated Learning

This paper presents a novel framework for analyzing multi-site functional magnetic resonance imaging (fMRI) data using privacy-preserving federated learning and domain adaptation, specifically applied to the Autism Brain Imaging Data Exchange (ABIDE) dataset. The primary goal is to enhance neuroimage analysis performance and identify reliable disease-related biomarkers without compromising patient data privacy.

The authors address key challenges in neuroimage analysis, such as the difficulty of aggregating large-scale datasets at a single site due to time, cost, and privacy concerns. Federated learning is employed as it enables the training of a shared global model without centralizing the data by allowing local entities to train on their respective datasets and only share model updates. However, concerns remain about the potential leakage of private information from these shared model updates.

Methodological Contributions

Federated Learning Framework: The authors propose a federated learning setup where a decentralized iterative optimization algorithm is implemented. Local model weights are randomized before being shared to ensure privacy. This approach assumes a horizontal federated learning scheme where the datasets share feature space but differ in samples.
Domain Adaptation Techniques: Recognizing the systemic differences in fMRI data distributions across sites, the authors integrate domain adaptation techniques into the federated learning framework. Two strategies are explored:
- Mixture of Experts (MoE): A method where a global model is combined with private models using a gating mechanism to optimally learn from shared and private information.
- Adversarial Domain Alignment: This technique aims to align feature distributions between source and target domains via adversarial training.
Evaluation through Biomarker Detection: In addition to standard accuracy metrics, the framework includes an evaluation of the detected biomarkers' reliability and informativeness by assessing their consistency and semantic correlation with known functional networks.

Experimental Results

Extensive experiments were conducted on the ABIDE dataset, involving participants from multiple sites. Key results demonstrate the framework's efficacy in improving classification performance across sites while maintaining data privacy. Notably, the federated learning approach with domain adaptation outperformed traditional single-site training and cross-site training methodologies. Additionally, the models successfully identified brain connectivity patterns that serve as potential biomarkers for autism spectrum disorders.

Implications and Future Directions

The proposed framework opens up new avenues for collaborative medical research by facilitating multisite data utilization without violating privacy norms. This is particularly significant in medical imaging domains, where decentralized, privacy-preserving techniques can enable broader collaboration and more robust model development.

The integration of domain adaptation further mitigates issues associated with domain shift, ensuring that the models remain generalizable across disparate datasets. The identified domain adaptation strategies—each tailored to address specific domain discrepancies—provide a basis for further exploration in federated learning applications beyond fMRI analysis.

Future work might focus on refining the domain adaptation strategies, especially exploring adaptive mechanisms that dynamically select the appropriate adaptation techniques based on specific dataset characteristics. Additionally, enhancing the privacy-preserving aspects via differential privacy mechanisms and formally quantifying privacy guarantees will be crucial steps in advancing this research.

In summary, this paper offers a structured and practical solution to a critical issue in medical data analysis, with the potential for broader applications in privacy-sensitive fields.

PDF Markdown

Related Papers

Find Related Papers