Towards Multi-modal Transformers in Federated Learning (2404.12467v2)

Published 18 Apr 2024 in cs.CV and cs.LG

Abstract: Multi-modal transformers mark significant progress in different domains, but siloed high-quality data hinders their further improvement. To remedy this, federated learning (FL) has emerged as a promising privacy-preserving paradigm for training models without direct access to the raw data held by different clients. Despite its potential, a considerable research direction regarding the unpaired uni-modal clients and the transformer architecture in FL remains unexplored. To fill this gap, this paper explores a transfer multi-modal federated learning (MFL) scenario within the vision-language domain, where clients possess data of various modalities distributed across different datasets. We systematically evaluate the performance of existing methods when a transformer architecture is utilized and introduce a novel framework called Federated modality complementary and collaboration (FedCola) by addressing the in-modality and cross-modality gaps among clients. Through extensive experiments across various FL settings, FedCola demonstrates superior performance over previous approaches, offering new perspectives on future federated training of multi-modal transformers.

Authors (5)

Guangyu Sun (47 papers)
Matias Mendieta (15 papers)
Aritra Dutta (26 papers)
Xin Li (980 papers)
Chen Chen (753 papers)

Citations (1)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/gastronomy/status/1782258921354285419

https://twitter.com/CSVisionPapers/status/1782294643339665673

Towards Multi-modal Transformers in Federated Learning (2404.12467v2)

Summary

Related Papers

Tweets