Distributed Harmonization: Federated Clustered Batch Effect Adjustment and Generalization (2405.15081v3)
Abstract: Independent and identically distributed (i.i.d.) data is essential to many data analysis and modeling techniques. In the medical domain, collecting data from multiple sites or institutions is a common strategy that guarantees sufficient clinical diversity, determined by the decentralized nature of medical data. However, data from various sites are easily biased by the local environment or facilities, thereby violating the i.i.d. rule. A common strategy is to harmonize the site bias while retaining important biological information. The ComBat is among the most popular harmonization approaches and has recently been extended to handle distributed sites. However, when faced with situations involving newly joined sites in training or evaluating data from unknown/unseen sites, ComBat lacks compatibility and requires retraining with data from all the sites. The retraining leads to significant computational and logistic overhead that is usually prohibitive. In this work, we develop a novel Cluster ComBat harmonization algorithm, which leverages cluster patterns of the data in different sites and greatly advances the usability of ComBat harmonization. We use extensive simulation and real medical imaging data from ADNI to demonstrate the superiority of the proposed approach. Our codes are provided in https://github.com/illidanlab/distributed-cluster-harmonization.
- Brain Tumor Detection Based on Deep Learning Approaches and Magnetic Resonance Imaging. Cancers 15, 16 (Aug. 2023), 4172. https://doi.org/10.3390/cancers15164172
- Site effects how-to and when: An overview of retrospective techniques to accommodate site effects in multi-site neuroimaging analyses. Frontiers in Neurology 13 (Oct. 2022). https://doi.org/10.3389/fneur.2022.923988
- Longitudinal ComBat: A method for harmonizing longitudinal multi-scanner imaging data. NeuroImage 220 (Oct. 2020), 117129. https://doi.org/10.1016/j.neuroimage.2020.117129
- Flower: A Friendly Federated Learning Research Framework. arXiv:2007.14390 [cs.LG]
- Associations between cortical β𝛽\betaitalic_β-amyloid burden, fornix microstructure and cognitive processing of faces, places, bodies and other visual objects in early Alzheimer’s disease. Hippocampus 33, 2 (2023), 112–124.
- Privacy-preserving harmonization via distributed ComBat. NeuroImage 248 (March 2022), 118822. https://doi.org/10.1016/j.neuroimage.2021.118822
- Development and assessment of a composite score for memory in the Alzheimer’s Disease Neuroimaging Initiative (ADNI). Brain imaging and behavior 6 (2012), 502–516.
- A Federated Learning Based Privacy Preserving Approach for Detecting Parkinson’s Disease Using Deep Learning. In 2022 25th International Conference on Computer and Information Technology (ICCIT). IEEE, 139–144.
- Learning from electronic health records across multiple sites: A communication-efficient and privacy-preserving distributed algorithm. Journal of the American Medical Informatics Association 27, 3 (Dec. 2019), 376–385. https://doi.org/10.1093/jamia/ocz199
- Kristian Steen Frederiksen. 2013. Corpus callosum in aging and dementia. Dan Med J 60, 10 (2013), B4721.
- An Effective Distributed Privacy-Preserving Data Mining Algorithm. Springer Berlin Heidelberg, 320–325. https://doi.org/10.1007/978-3-540-28651-6_47
- Machine learning models for diagnosis and prognosis of Parkinson’s disease using brain imaging: general overview, main challenges, and future directions. Frontiers in Aging Neuroscience 15 (July 2023). https://doi.org/10.3389/fnagi.2023.1216163
- Composite measures of executive function and memory: ADNI_EF and ADNI_Mem. Alzheimer’s Dis Neuroimaging Initiat (2012).
- A novel secure and distributed architecture for privacy-preserving healthcare system. Journal of Network and Computer Applications 217 (Aug. 2023), 103696. https://doi.org/10.1016/j.jnca.2023.103696
- Privacy-Preserving Federated Learning With Resource Adaptive Compression for Edge Devices. IEEE Internet of Things Journal PP (01 2023), 1–1. https://doi.org/10.1109/JIOT.2023.3347552
- Evaluating Alzheimer’s disease biomarkers as mediators of age-related cognitive decline. Neurobiology of aging 58 (2017), 120–128.
- Nasir Ahmad Jalali and Hongsong Chen. 2024. Federated Learning Security and Privacy-Preserving Algorithm and Experiments Research Under Internet of Things Critical Infrastructure. Tsinghua Science and Technology 29, 2 (2024), 400–414. https://doi.org/10.26599/TST.2023.9010007
- Endothelial function is associated with white matter microstructure and executive function in older adults. Frontiers in Aging Neuroscience 9 (2017), 255.
- Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8, 1 (April 2006), 118–127. https://doi.org/10.1093/biostatistics/kxj037
- Privacy-Preserving Distributed Processing: Metrics, Bounds and Algorithms. IEEE Transactions on Information Forensics and Security 16 (2021), 2090–2103. https://doi.org/10.1109/tifs.2021.3050064
- Bridging Reduced Grip Strength and Altered Executive Function: Specific Brain White Matter Structural Changes in Patients with Alzheimer’s Disease. Clinical Interventions in Aging (2024), 93–107.
- Andrzej Maćkiewicz and Waldemar Ratajczak. 1993. Principal components analysis (PCA). Computers& Geosciences 19, 3 (March 1993), 303–342. https://doi.org/10.1016/0098-3004(93)90090-r
- Communication-Efficient Learning of Deep Networks from Decentralized Data. (2016). https://doi.org/10.48550/ARXIV.1602.05629
- Robert Monsour. 2022. Neuroimaging in the Era of Artificial Intelligence: Current Applications. Federal Practitioner 39 (Suppl 1) (April 2022). https://doi.org/10.12788/fp.0231
- Effectiveness of regional DTI measures in distinguishing Alzheimer’s disease, MCI, and normal aging. NeuroImage: clinical 3 (2013), 180–195.
- A Guide to ComBat Harmonization of Imaging Biomarkers in Multicenter Studies. Journal of Nuclear Medicine 63, 2 (Sept. 2021), 172–179. https://doi.org/10.2967/jnumed.121.262464
- Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830.
- Experimental Multicenter and Multivendor Evaluation of the Performance of PET Radiomic Features Using 3-Dimensionally Printed Phantom Inserts. Journal of Nuclear Medicine 61, 3 (Aug. 2019), 469–476. https://doi.org/10.2967/jnumed.119.229724
- Harmonization of large MRI datasets for the analysis of brain imaging patterns throughout the lifespan. NeuroImage 208 (March 2020), 116450. https://doi.org/10.1016/j.neuroimage.2019.116450
- ComBat Harmonization: Empirical Bayes versus fully Bayes approaches. NeuroImage: Clinical 39 (2023), 103472. https://doi.org/10.1016/j.nicl.2023.103472
- Medical Imaging Applications of Federated Learning. Diagnostics 13, 19 (2023). https://doi.org/10.3390/diagnostics13193140
- Large-scale analysis of structural brain asymmetries in schizophrenia via the ENIGMA consortium. Proceedings of the National Academy of Sciences 120, 14 (2023), e2213880120.
- Bridging cognition and action: executive functioning mediates the relationship between white matter fiber density and complex motor abilities in older adults. Aging (Albany NY) 14, 18 (2022), 7263.
- Fed-ComBat: A Generalized Federated Framework for Batch Effect Harmonization in Collaborative Studies. (May 2023). https://doi.org/10.1101/2023.05.24.542107
- How Machine Learning is Powering Neuroimaging to Improve Brain Health. Neuroinformatics 20, 4 (March 2022), 943–964. https://doi.org/10.1007/s12021-022-09572-9
- The ENIGMA Consortium: large-scale collaborative analyses of neuroimaging and genetic data. Brain imaging and behavior 8 (2014), 153–182.
- Robert Tibshirani. 1996. Regression Shrinkage and Selection Via the Lasso. Journal of the Royal Statistical Society: Series B (Methodological) 58, 1 (Jan. 1996), 267–288. https://doi.org/10.1111/j.2517-6161.1996.tb02080.x
- YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7464–7475.
- The Added Value of Diffusion-Weighted MRI-Derived Structural Connectome in Evaluating Mild Cognitive Impairment: A Multi-Cohort Validation1. Journal of Alzheimer’s Disease 64, 1 (June 2018), 149–169. https://doi.org/10.3233/jad-171048
- Multi-Modality Disease Modeling via Collective Deep Matrix Factorization. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’17). ACM. https://doi.org/10.1145/3097983.3098164
- Discriminative fusion of multiple brain networks for early mild cognitive impairment detection. In 2016 IEEE 13th International Symposium on Biomedical Imaging (ISBI). 568–572. https://doi.org/10.1109/ISBI.2016.7493332
- A Privacy-Preserving Distributed Analytics Platform for Health Care Data. Methods of Information in Medicine 61, S 01 (Jan. 2022), e1–e11. https://doi.org/10.1055/s-0041-1740564
- Privacy-preserving data sharing infrastructures for medical research: systematization and comparison. BMC Medical Informatics and Decision Making 21, 1 (Aug. 2021). https://doi.org/10.1186/s12911-021-01602-x
- A privacy-preserving and computation-efficient federated algorithm for generalized linear mixed models to analyze correlated electronic health records data. PLOS ONE 18, 1 (Jan. 2023), e0280192. https://doi.org/10.1371/journal.pone.0280192
- Generalized Out-of-Distribution Detection: A Survey. https://doi.org/10.48550/ARXIV.2110.11334
- Problem solving, working memory, and motor correlates of association and commissural fiber bundles in normal aging: a quantitative fiber tracking study. Neuroimage 44, 3 (2009), 1050–1062.
- FedLab: A Flexible Federated Learning Framework. Journal of Machine Learning Research 24, 100 (2023), 1–7. http://jmlr.org/papers/v24/22-0440.html
- Understanding scanner upgrade effects on brain integrity & connectivity measures. In 2014 IEEE 11th International Symposium on Biomedical Imaging (ISBI). IEEE, 234–237.
- Boosting brain connectome classification accuracy in Alzheimer’s disease using higher-order singular value decomposition. Frontiers in Neuroscience 9 (July 2015). https://doi.org/10.3389/fnins.2015.00257
- Comparison of nine tractography algorithms for detecting abnormal structural brain networks in Alzheimer’s disease. Frontiers in Aging Neuroscience 7 (April 2015). https://doi.org/10.3389/fnagi.2015.00048