Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Distributionally Robust Alignment for Medical Federated Vision-Language Pre-training Under Data Heterogeneity (2404.03854v3)

Published 5 Apr 2024 in cs.LG, cs.CL, and cs.CV

Abstract: Vision-language pre-training (VLP) has emerged as an effective scheme for multimodal representation learning, but its reliance on large-scale multimodal data poses significant challenges for medical applications. Federated learning (FL) offers a promising solution to scale up the dataset for medical VLP while preserving data privacy. However, we observe that client data heterogeneity in real-world scenarios could cause models to learn biased cross-modal alignment during local pre-training. This would limit the transferability of the federally learned representation model on downstream tasks. To address this challenge, we propose Federated Distributionally Robust Alignment (FedDRA), a framework for federated VLP that achieves robust vision-language alignment under heterogeneous conditions. Based on client datasets, we construct a distribution family that encompasses potential test-time domains, and apply a distributionally robust framework to optimize the pre-trained model's performance across this distribution space. This approach bridges the gap between pre-training samples and downstream applications. To avoid over-fitting on client-specific information, we use anchor representation from the global model to guide the local training, and adopt a two-stage approach to first tune deeper layers before updating the entire network. Extensive experiments on real-world datasets demonstrate FedDRA's effectiveness in enhancing medical federated VLP under data heterogeneity. Our method also adapts well to various medical pre-training methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (68)
  1. Ehrxqa: A multi-modal question answering dataset for electronic health records with chest x-ray images. arXiv preprint arXiv:2310.18652, 2023.
  2. Learning to exploit temporal structure for biomedical vision-language processing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  15016–15027, 2023.
  3. Vlmo: Unified vision-language pre-training with mixture-of-modality-experts. Advances in Neural Information Processing Systems, 35:32897–32912, 2022.
  4. Reflacx, a dataset of reports and eye-tracking data for localization of abnormalities in chest x-rays. Scientific data, 9(1):350, 2022.
  5. Latent dirichlet allocation. Journal of machine Learning research, 3(Jan):993–1022, 2003.
  6. Making the most of text semantics to improve biomedical vision–language processing. In European conference on computer vision, pp.  1–21. Springer, 2022.
  7. Clusterfix: A cluster-based debiasing approach without protected-group supervision. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp.  4870–4879, 2024.
  8. Federated large language model: A position paper. arXiv preprint arXiv:2307.08925, 2023a.
  9. Feddat: An approach for foundation model finetuning in multi-modal heterogeneous federated learning. arXiv preprint arXiv:2308.12305, 2023b.
  10. Graph optimal transport for cross-domain alignment. In International Conference on Machine Learning, pp. 1542–1553. PMLR, 2020a.
  11. A simple framework for contrastive learning of visual representations. In International conference on machine learning, pp. 1597–1607. PMLR, 2020b.
  12. Exploiting shared representations for personalized federated learning. In International conference on machine learning, pp. 2089–2099. PMLR, 2021.
  13. Optimization with non-differentiable constraints with applications to fairness, recall, churn, and other goals. Journal of Machine Learning Research, 20(172):1–59, 2019.
  14. Distributionally robust federated averaging. Advances in neural information processing systems, 33:15111–15122, 2020.
  15. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  16. Heterofl: Computation and communication efficient federated learning for heterogeneous clients. arXiv preprint arXiv:2010.01264, 2020.
  17. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  18. Efficient projections onto the l 1-ball for learning in high dimensions. In Proceedings of the 25th international conference on Machine learning, pp.  272–279, 2008.
  19. Learning models with uniform performance via distributionally robust optimization. The Annals of Statistics, 49(3):1378–1406, 2021.
  20. Robust federated learning in a heterogeneous environment. arXiv preprint arXiv:1906.06629, 2019.
  21. Distributionally robust unsupervised dense retrieval training on web graphs. arXiv preprint arXiv:2310.16605, 2023.
  22. Fedx: Unsupervised federated learning with cross knowledge distillation. In European Conference on Computer Vision, pp.  691–707. Springer, 2022.
  23. Healnet–hybrid multi-modal fusion for heterogeneous biomedical data. arXiv preprint arXiv:2311.09115, 2023.
  24. Gloria: A multimodal global-local representation learning framework for label-efficient medical image recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  3942–3951, 2021.
  25. Learn from others and be yourself in heterogeneous federated learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  10143–10153, 2022.
  26. Harmofl: Harmonizing local and global drifts in federated learning on heterogeneous medical images. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pp.  1087–1095, 2022.
  27. Heterogeneous graph learning for multi-modal medical data analysis. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pp.  5141–5150, 2023.
  28. Self-supervised learning in medicine and healthcare. Nature Biomedical Engineering, 6(12):1346–1352, 2022.
  29. Integration of artificial intelligence in lung cancer: Rise of the machine. Cell Reports Medicine, 2023.
  30. Fedmd: Heterogenous federated learning via model distillation. arXiv preprint arXiv:1910.03581, 2019.
  31. Sequential learning for domain generalization. In European Conference on Computer Vision, pp.  603–619. Springer, 2020a.
  32. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In International Conference on Machine Learning, pp. 12888–12900. PMLR, 2022a.
  33. Federated learning on non-iid data silos: An experimental study. In 2022 IEEE 38th International Conference on Data Engineering (ICDE), pp.  965–978. IEEE, 2022b.
  34. Federated learning: Challenges, methods, and future directions. IEEE signal processing magazine, 37(3):50–60, 2020b.
  35. Exploring plain vision transformer backbones for object detection. In European Conference on Computer Vision, pp.  280–296. Springer, 2022c.
  36. Distributionally robust learning with stable adversarial training. IEEE Transactions on Knowledge and Data Engineering, 2022.
  37. Scaling-up medical vision-and-language representation learning with federated learning. Engineering Applications of Artificial Intelligence, 126:107037, 2023a.
  38. Zoopfl: Exploring black-box foundation models for personalized federated learning. arXiv preprint arXiv:2310.05143, 2023b.
  39. Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics, pp.  1273–1282. PMLR, 2017.
  40. A comprehensive study of image classification model sensitivity to foregrounds, backgrounds, and visual attributes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  19087–19097, 2022.
  41. Vindr-cxr: An open dataset of chest x-rays with radiologist’s annotations. Scientific Data, 9(1):429, 2022.
  42. Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193, 2023.
  43. Multivariate prototype representation for domain-generalized incremental learning. arXiv preprint arXiv:2309.13563, 2023.
  44. On guiding visual attention with language specification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  18092–18102, 2022.
  45. Learning transferable visual models from natural language supervision. In International conference on machine learning, pp. 8748–8763. PMLR, 2021.
  46. The future of digital health with federated learning. NPJ digital medicine, 3(1):119, 2020.
  47. Reducing reliance on spurious features in medical image classification with spatial specificity. In Machine Learning for Healthcare Conference, pp.  760–784. PMLR, 2022.
  48. Distributionally robust optimization for deep kernel multiple instance learning. In International Conference on Artificial Intelligence and Statistics, pp.  2188–2196. PMLR, 2021.
  49. Augmenting the national institutes of health chest radiograph dataset with expert annotations of possible pneumonia. Radiology: Artificial Intelligence, 1(1):e180041, 2019.
  50. On generalizing beyond domains in cross-domain continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  9265–9274, 2022.
  51. Cross-domain federated adaptive prompt tuning for clip. arXiv preprint arXiv:2211.07864, 2022.
  52. Continual adaptation of visual representations via domain randomization and meta-learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  4443–4453, 2021.
  53. Multi-granularity cross-modal alignment for generalized medical visual representation learning. Advances in Neural Information Processing Systems, 35:33536–33549, 2022.
  54. Covid-net: A tailored deep convolutional neural network design for detection of covid-19 cases from chest x-ray images. Scientific reports, 10(1):19549, 2020.
  55. Chest imagenome dataset (version 1.0. 0). PhysioNet, 5:18, 2021.
  56. Cross-modal semantic alignment pre-training for vision-and-language navigation. In Proceedings of the 30th ACM International Conference on Multimedia, pp.  4233–4241, 2022.
  57. Label-efficient self-supervised federated learning for tackling data heterogeneity in medical imaging. IEEE Transactions on Medical Imaging, 2023.
  58. Vision-language pre-training with triple contrastive learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  15671–15680, 2022.
  59. Multimodal federated learning via contrastive representation ensemble. arXiv preprint arXiv:2302.08888, 2023a.
  60. Federated foundation models: Privacy-preserving and collaborative learning for large models. arXiv preprint arXiv:2305.11414, 2023b.
  61. Federated unsupervised representation learning. Frontiers of Information Technology & Electronic Engineering, 24(8):1181–1193, 2023a.
  62. Unified fair federated learning for digital healthcare. Patterns, 2023b.
  63. Heterogeneous feature fusion and cross-modal alignment for composed image retrieval. In Proceedings of the 29th ACM International Conference on Multimedia, pp.  5353–5362, 2021.
  64. Robust self-supervised structural graph neural network for social network prediction. In Proceedings of the ACM Web Conference 2022, pp. 1352–1361, 2022a.
  65. Contrastive learning of medical visual representations from paired images and text. In Machine Learning for Healthcare Conference, pp.  2–25. PMLR, 2022b.
  66. Fedvln: Privacy-preserving federated vision-and-language navigation. In European Conference on Computer Vision, pp.  682–699. Springer, 2022.
  67. Divergence-aware federated self-supervised learning. arXiv preprint arXiv:2204.04385, 2022.
  68. When foundation model meets federated learning: Motivations, challenges, and future directions. arXiv preprint arXiv:2306.15546, 2023.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets