Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 63 tok/s

Gemini 2.5 Pro 49 tok/s Pro

GPT-5 Medium 11 tok/s Pro

GPT-5 High 10 tok/s Pro

GPT-4o 83 tok/s Pro

Kimi K2 139 tok/s Pro

GPT OSS 120B 438 tok/s Pro

Claude Sonnet 4 38 tok/s Pro

2000 character limit reached

Federated Learning in Distributed Medical Databases: Meta-Analysis of Large-Scale Subcortical Brain Data (1810.08553v4)

Published 19 Oct 2018 in stat.ML, cs.LG, q-bio.NC, and q-bio.QM

Abstract: At this moment, databanks worldwide contain brain images of previously unimaginable numbers. Combined with developments in data science, these massive data provide the potential to better understand the genetic underpinnings of brain diseases. However, different datasets, which are stored at different institutions, cannot always be shared directly due to privacy and legal concerns, thus limiting the full exploitation of big data in the study of brain disorders. Here we propose a federated learning framework for securely accessing and meta-analyzing any biomedical data without sharing individual information. We illustrate our framework by investigating brain structural relationships across diseases and clinical cohorts. The framework is first tested on synthetic data and then applied to multi-centric, multi-database studies including ADNI, PPMI, MIRIAD and UK Biobank, showing the potential of the approach for further applications in distributed analysis of multi-centric cohorts

Citations (167)

View on Semantic Scholar

Summary

Federated Learning in Distributed Medical Databases: A Meta-Analysis of Large-Scale Subcortical Brain Data

The paper "Federated Learning in Distributed Medical Databases: Meta-Analysis of Large-Scale Subcortical Brain Data" introduces a sophisticated federated learning framework tailored for the challenges posed by distributed neuroimaging datasets. This paper addresses the pervasive issue of privacy and data sharing restrictions, which impede comprehensive analyses of brain disorders using large-scale, multi-centric data.

Federated Framework Proposal

The authors propose an end-to-end methodology that incorporates data standardization, correction of confounding factors, and multivariate analysis through Alternating Direction Method of Multipliers (ADMM) to mitigate the limitations inherent in conventional procedural approaches. This framework adeptly reduces the iteration count necessary for convergence, thereby optimizing communication overhead among centers, which is often a bottleneck in federated settings.

Methodology Overview

Data Standardization: The authors employ an iterative approach across centers that aligns all features to zero-means and unit standard deviations using global statistics, enhancing the comparability of datasets.
Confounding Factor Correction: They address bias emerging from factors such as age and sex using a constrained regression model and ADMM techniques to estimate parameters across centers consistently. This ensures the removal of confounding elements without sharing individual-level data.
Federated PCA (fPCA): This step compresses data dimensionality by leveraging distributed principal component analysis, sharing only eigen-components associated with local covariance matrices, effectively reducing data transmission requirements while preserving essential variation features.

Experimental Validation

The framework has been validated on both synthetic and real-world datasets extracted from significant neuroimaging databases including ADNI, PPMI, MIRIAD, and UK Biobank. The comprehensive federated learning approach is benchmarked against traditional methods and showcases potential efficacy in analyzing subcortical thickness and shape features across various neurological conditions such as Alzheimer's disease (AD) and Parkinson's disease (PD).

Results

The federated PCA demonstrated robust identification of variability among different cohort groups, notably differentiating between healthy controls and AD subjects, albeit with some residual effects possibly related to age-related variability. The subcortical mapping yielded principal components emphasizing hippocampal and amygdalae features, which are clinically relevant for neurodegenerative diagnoses.

Implications and Future Prospects

This paper extends the frontier of multi-centric biomedical analyses by proposing a viable alternative to centralizing sensitive data. The methodology holds promise for broader applications in multimodal analyses, potentially including imaging-genetics studies, which require high-dimensional covariance consideration.

The implications for federated learning extend to secure, large-scale analyses where privacy concerns are paramount, offering pathways to tackle neuroimaging data challenges through collaborative, yet decentralized approaches. Future developments could refine correction algorithms for confounding factors further to diminish residual variabilities and facilitate harmonized analyses with greater precision.

The prospect of federated learning frameworks represents an evolutionary step towards enabling comprehensive and secure exploration of neuroimaging datasets, fostering breakthroughs in understanding, diagnosing, and treating complex brain disorders.