Papers
Topics
Authors
Recent
2000 character limit reached

Bidirectional Contrastive Split Learning for Visual Question Answering (2208.11435v4)

Published 24 Aug 2022 in cs.CV and cs.LG

Abstract: Visual Question Answering (VQA) based on multi-modal data facilitates real-life applications such as home robots and medical diagnoses. One significant challenge is to devise a robust decentralized learning framework for various client models where centralized data collection is refrained due to confidentiality concerns. This work aims to tackle privacy-preserving VQA by decoupling a multi-modal model into representation modules and a contrastive module and leveraging inter-module gradients sharing and inter-client weight sharing. To this end, we propose Bidirectional Contrastive Split Learning (BiCSL) to train a global multi-modal model on the entire data distribution of decentralized clients. We employ the contrastive loss that enables a more efficient self-supervised learning of decentralized modules. Comprehensive experiments are conducted on the VQA-v2 dataset based on five SOTA VQA models, demonstrating the effectiveness of the proposed method. Furthermore, we inspect BiCSL's robustness against a dual-key backdoor attack on VQA. Consequently, BiCSL shows much better robustness to the multi-modal adversarial attack compared to the centralized learning method, which provides a promising approach to decentralized multi-modal learning.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (33)
  1. Deep Learning with Differential Privacy. In ACM Conference on Computer and Communications Security.
  2. VQA: Visual Question Answering - www.visualqa.org. In Int. J. Comput. Vis., volume 123, 4–31.
  3. Self-Supervised MultiModal Versatile Networks. In NeurIPS.
  4. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering. In CVPR.
  5. Bishop, C. M. 2006. Pattern recognition and machine learning. Springer.
  6. et al., A. R. 2021. Learning Transferable Visual Models From Natural Language Supervision. In ICML.
  7. Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering. In Int. J. Comput. Vis., volume 127, 398–414.
  8. SSFL: Tackling Label Deficiency in Federated Learning via Personalized Self-Supervision.
  9. Long Short-Term Memory. Neural Comput., 9(8): 1735–1780.
  10. Bilinear Attention Networks. In NeurIPS.
  11. Kim, T. K. 2015. T test as a parametric statistic. In Korean booktitleof anesthesiology, volume 68.
  12. ImageNet classification with deep convolutional neural networks. Commun. ACM, 60(6): 84–90.
  13. FedMD: Heterogenous Federated Learning via Model Distillation.
  14. Microsoft COCO: Common Objects in Context. In ECCV.
  15. Federated Learning for Vision-and-Language Grounding Problems. In AAAI.
  16. Trojaning Attack on Neural Networks. In Annual Network and Distributed System Security Symposium.
  17. Communication-Efficient Learning of Deep Networks from Decentralized Data. In AISTATS.
  18. Representation Learning with Contrastive Predictive Coding. In arXiv Preprint.
  19. Multi-modal Self-Supervision from Generalized Data Transformations. In arXiv preprint.
  20. GloVe: Global Vectors for Word Representation. In EMNLP.
  21. Hierarchical Text-Conditional Image Generation with CLIP Latents. In arXiv preprint arXiv.2204.06125.
  22. Zero-Shot Text-to-Image Generation. In ICML.
  23. AVLnet: Learning Audio-Visual Language Representations from Instructional Videos. In Annual Conference of the International Speech Communication Association.
  24. Decentralized Deep Learning for Multi-Access Edge Computing: A Survey on Communication Efficiency and Trustworthiness. In IEEE Transactions on Artificial Intelligence.
  25. Instance-level Trojan Attacks on Visual Question Answering via Adversarial Learning in Neuron Activation Space. arXiv:2304.00436.
  26. SplitFed: When Federated Learning Meets Split Learning. In AAAI.
  27. Attention is All you Need. In NeurIPS.
  28. Swarm Learning for decentralized and confidential clinical machine learning.
  29. Stacked Attention Networks for Image Question Answering. In CVPR.
  30. Deep Multimodal Neural Architecture Search. In ACM Multimedia.
  31. Deep Modular Co-Attention Networks for Visual Question Answering. In CVPR.
  32. Multi-modal Factorized Bilinear Pooling with Co-attention Learning for Visual Question Answering. In ICCV.
  33. Overcoming Language Priors with Self-supervised Learning for Visual Question Answering. In IJCAI.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Paper to Video (Beta)

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (2)

Collections

Sign up for free to add this paper to one or more collections.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube