Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multimodal Federated Learning with Missing Modality via Prototype Mask and Contrast (2312.13508v2)

Published 21 Dec 2023 in cs.LG, cs.AI, and cs.DC

Abstract: In real-world scenarios, multimodal federated learning often faces the practical challenge of intricate modality missing, which poses constraints on building federated frameworks and significantly degrades model inference accuracy. Existing solutions for addressing missing modalities generally involve developing modality-specific encoders on clients and training modality fusion modules on servers. However, these methods are primarily constrained to specific scenarios with either unimodal clients or complete multimodal clients, struggling to generalize effectively in the intricate modality missing scenarios. In this paper, we introduce a prototype library into the FedAvg-based Federated Learning framework, thereby empowering the framework with the capability to alleviate the global model performance degradation resulting from modality missing during both training and testing. The proposed method utilizes prototypes as masks representing missing modalities to formulate a task-calibrated training loss and a model-agnostic uni-modality inference strategy. In addition, a proximal term based on prototypes is constructed to enhance local training. Experimental results demonstrate the state-of-the-art performance of our approach. Compared to the baselines, our method improved inference accuracy by 3.7\% with 50\% modality missing during training and by 23.8\% during uni-modality inference. Code is available at https://github.com/BaoGuangYin/PmcmFL.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (40)
  1. BEiT: BERT pre-training of image transformers. In ICLR, 2022.
  2. The best of both worlds: Accurate global and personalized models through federated learning with data-free hyper-knowledge distillation. In ICLR, 2023a.
  3. Fedmsplit: Correlation-adaptive federated multi-task learning across multimodal split networks. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 87–96, 2022.
  4. Pali: A jointly-scaled multilingual language-image model. In ICLR, 2023b.
  5. Tackling data heterogeneity in federated learning with class prototypes. In AAAI, pages 7314–7322, 2023.
  6. BERT: pre-training of deep bidirectional transformers for language understanding. pages 4171–4186, 2019.
  7. An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR, 2021.
  8. Fedmultimodal: A benchmark for multimodal federated learning. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 4035–4045, 2023.
  9. Making the V in VQA matter: Elevating the role of image understanding in visual question answering. In CVPR, pages 6325–6334, 2017.
  10. Measuring the effects of non-identical data distribution for federated visual classification. CoRR, abs/1909.06335, 2019.
  11. Learn from others and be yourself in heterogeneous federated learning. In CVPR, pages 10133–10143, 2022.
  12. Rethinking federated learning with domain shift: A prototype view. In CVPR, pages 16312–16322, 2023.
  13. Vilt: Vision-and-language transformer without convolution or region supervision. In Proceedings of the 38th International Conference on Machine Learning, pages 5583–5594, 2021.
  14. Fedmekt: Distillation-based embedding knowledge transfer for multimodal federated learning. arXiv preprint arXiv:2307.13214, 2023.
  15. Prototype-guided knowledge transfer for federated unsupervised cross-modal hashing. In ACM MM, pages 1013–1022, 2023.
  16. Model-contrastive federated learning. In CVPR, pages 10713–10722, 2021.
  17. Federated optimization in heterogeneous networks. In Proceedings of Machine Learning and Systems, 2020.
  18. Hyperfed: Hyperbolic prototypes exploration with consistent aggregation for non-iid data in federated learning. In IJCAI, pages 3957–3965, 2023.
  19. Prototype-based layered federated cross-modal hashing. In ICASSP, pages 1–2, 2023.
  20. A prototype-based knowledge distillation framework for heterogeneous federated learning. In 43rd IEEE International Conference on Distributed Computing Systems, pages 1–11, 2023.
  21. SMIL: multimodal learning with severely missing modality. In AAAI, pages 2302–2310, 2021.
  22. Are multimodal transformers robust to missing modality? In CVPR, pages 18156–18165, 2022.
  23. Communication-efficient learning of deep networks from decentralized data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, pages 1273–1282, 2017.
  24. Local learning matters: Rethinking data heterogeneity in federated learning. In CVPR, pages 8387–8396, 2022.
  25. BEiT v2: Masked image modeling with vector-quantized visual tokenizers. 2022.
  26. Rethinking architecture design for tackling data heterogeneity in federated learning. In CVPR, pages 10051–10061, 2022.
  27. Learning transferable visual models from natural language supervision. In Proceedings of the 38th International Conference on Machine Learning, pages 8748–8763, 2021.
  28. Hierarchical text-conditional image generation with CLIP latents. CoRR, abs/2204.06125, 2022.
  29. Prototypical networks for few-shot learning. In NeurIPS, pages 4077–4087, 2017.
  30. Fedproto: Federated prototype learning across heterogeneous clients. In AAAI, pages 8432–8440, 2022.
  31. Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(11), 2008.
  32. Multi-modal learning with missing modality via shared-specific feature modelling. In CVPR, pages 15878–15887, 2023a.
  33. Image as a foreign language: BEIT pretraining for vision and vision-language tasks. In CVPR, pages 19175–19186, 2023b.
  34. Personalized federated learning with feature alignment and classifier collaboration. In ICLR, 2023.
  35. Coca: Contrastive captioners are image-text foundation models. Transactions on Machine Learning Research, 2022.
  36. Multimodal federated learning via contrastive representation ensemble. In ICLR, 2023.
  37. mixup: Beyond empirical risk minimization. In ICLR, 2018.
  38. Missing modality imagination network for emotion recognition with uncertain missing modalities. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, pages 2608–2618, 2021.
  39. Multimodal federated learning on iot data. In Seventh IEEE/ACM International Conference on Internet-of-Things Design and Implementation, pages 43–54, 2022.
  40. Data-free knowledge distillation for heterogeneous federated learning. In Proceedings of the 38th International Conference on Machine Learning, pages 12878–12889, 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Guangyin Bao (8 papers)
  2. Qi Zhang (785 papers)
  3. Duoqian Miao (25 papers)
  4. Zixuan Gong (10 papers)
  5. Liang Hu (64 papers)
  6. Ke Liu (597 papers)
  7. Yang Liu (2253 papers)
  8. Chongyang Shi (26 papers)
Citations (4)
Github Logo Streamline Icon: https://streamlinehq.com

GitHub

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets