Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Balanced Multi-modal Federated Learning via Cross-Modal Infiltration (2401.00894v1)

Published 31 Dec 2023 in cs.LG, cs.CV, and cs.MM

Abstract: Federated learning (FL) underpins advancements in privacy-preserving distributed computing by collaboratively training neural networks without exposing clients' raw data. Current FL paradigms primarily focus on uni-modal data, while exploiting the knowledge from distributed multimodal data remains largely unexplored. Existing multimodal FL (MFL) solutions are mainly designed for statistical or modality heterogeneity from the input side, however, have yet to solve the fundamental issue,"modality imbalance", in distributed conditions, which can lead to inadequate information exploitation and heterogeneous knowledge aggregation on different modalities.In this paper, we propose a novel Cross-Modal Infiltration Federated Learning (FedCMI) framework that effectively alleviates modality imbalance and knowledge heterogeneity via knowledge transfer from the global dominant modality. To avoid the loss of information in the weak modality due to merely imitating the behavior of dominant modality, we design the two-projector module to integrate the knowledge from dominant modality while still promoting the local feature exploitation of weak modality. In addition, we introduce a class-wise temperature adaptation scheme to achieve fair performance across different classes. Extensive experiments over popular datasets are conducted and give us a gratifying confirmation of the proposed framework for fully exploring the information of each modality in MFL.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (43)
  1. Multimodal categorization of crisis events in social media. in 2020 ieee. In CVF Conference on Computer Vision and Pattern Recognition, CVPR, pages 13–19, 2020.
  2. Crisismmd: Multimodal twitter datasets from natural disasters. In Proceedings of the international AAAI conference on web and social media, 2018.
  3. Differentially private secure multi-party computation for federated learning in financial applications. In Proceedings of the First ACM International Conference on AI in Finance, pages 1–9, 2020.
  4. Crema-d: Crowd-sourced emotional multimodal actors dataset. IEEE transactions on affective computing, 5(4):377–390, 2014.
  5. Learning efficient object detection models with knowledge distillation. Advances in neural information processing systems, 30, 2017.
  6. Fedmsplit: Correlation-adaptive federated multi-task learning across multimodal split networks. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 87–96, 2022.
  7. Towards optimal multi-modal federated learning on non-iid data with hierarchical gradient blending. In IEEE INFOCOM 2022-IEEE Conference on Computer Communications, pages 1469–1478. IEEE, 2022.
  8. A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597–1607. PMLR, 2020.
  9. Tackling data heterogeneity in federated learning with class prototypes. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 7314–7322, 2023.
  10. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  11. Pmr: Prototypical modal rebalance for multimodal learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20029–20038, 2023.
  12. Jia Guo. Reducing the teacher-student gap via adaptive temperatures. 2021.
  13. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  14. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.
  15. Measuring the effects of non-identical data distribution for federated visual classification. arXiv preprint arXiv:1909.06335, 2019.
  16. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4700–4708, 2017.
  17. Efficient large-scale multi-modal classification. In Proceedings of the AAAI conference on artificial intelligence, 2018.
  18. Prototypical contrastive learning of unsupervised representations. arXiv preprint arXiv:2005.04966, 2020a.
  19. Federated optimization in heterogeneous networks. Proceedings of Machine learning and systems, 2:429–450, 2020b.
  20. Decoupled multimodal distilling for emotion recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6631–6640, 2023a.
  21. Curriculum temperature for knowledge distillation. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 1504–1512, 2023b.
  22. Federated learning for vision-and-language grounding problems. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11572–11579, 2020.
  23. Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics, pages 1273–1282. PMLR, 2017.
  24. Audio-visual scene analysis with self-supervised multisensory features. In Proceedings of the European conference on computer vision (ECCV), pages 631–648, 2018.
  25. Balanced multimodal learning via on-the-fly gradient modulation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8238–8247, 2022.
  26. Film: Visual reasoning with a general conditioning layer. In Proceedings of the AAAI conference on artificial intelligence, 2018.
  27. Federated learning in a medical context: A systematic literature review. ACM Transactions on Internet Technology (TOIT), 21(2):1–31, 2021.
  28. Prototypical networks for few-shot learning. Advances in neural information processing systems, 30, 2017.
  29. Fedproto: Federated prototype learning across heterogeneous clients. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 8432–8440, 2022.
  30. Multimodal disentanglement variational autoencoders for zero-shot cross-modal retrieval. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 960–969, 2022.
  31. Audio-visual event localization in unconstrained videos. In Proceedings of the European conference on computer vision (ECCV), pages 247–263, 2018.
  32. What makes training multi-modal classification networks hard? In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12695–12705, 2020.
  33. Multimodal disentangled representation for recommendation. In 2021 IEEE International Conference on Multimedia and Expo (ICME), pages 1–6. IEEE, 2021.
  34. Characterizing and overcoming the greedy nature of learning in multi-modal deep neural networks. In International Conference on Machine Learning, pages 24043–24055. PMLR, 2022.
  35. Audiovisual slowfast networks for video recognition. arXiv preprint arXiv:2001.08740, 2020.
  36. A unified framework for multi-modal federated learning. Neurocomputing, 480:110–118, 2022.
  37. The modality focusing hypothesis: Towards understanding crossmodal knowledge distillation. In The Eleventh International Conference on Learning Representations, 2022.
  38. Disentangled representation learning for multimodal emotion recognition. In Proceedings of the 30th ACM International Conference on Multimedia, pages 1642–1651, 2022.
  39. Modality-specific learning rates for effective multimodal additive late-fusion. In Findings of the Association for Computational Linguistics: ACL 2022, pages 1824–1834, 2022.
  40. Multimodal federated learning via contrastive representation ensemble. arXiv preprint arXiv:2302.08888, 2023.
  41. Multimodal federated learning on iot data. In 2022 IEEE/ACM Seventh International Conference on Internet-of-Things Design and Implementation (IoTDI), pages 43–54. IEEE, 2022.
  42. Autofed: Heterogeneity-aware federated multimodal learning for robust autonomous driving. arXiv preprint arXiv:2302.08646, 2023.
  43. Applications of federated learning in smart cities: recent advances, taxonomy, and open challenges. Connection Science, 34(1):1–28, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Yunfeng Fan (7 papers)
  2. Wenchao Xu (52 papers)
  3. Haozhao Wang (52 papers)
  4. Jiaqi Zhu (28 papers)
  5. Song Guo (138 papers)