Gradient-Guided Modality Decoupling for Missing-Modality Robustness (2402.16318v1)
Abstract: Multimodal learning with incomplete input data (missing modality) is practical and challenging. In this work, we conduct an in-depth analysis of this challenge and find that modality dominance has a significant negative impact on the model training, greatly degrading the missing modality performance. Motivated by Grad-CAM, we introduce a novel indicator, gradients, to monitor and reduce modality dominance which widely exists in the missing-modality scenario. In aid of this indicator, we present a novel Gradient-guided Modality Decoupling (GMD) method to decouple the dependency on dominating modalities. Specifically, GMD removes the conflicted gradient components from different modalities to achieve this decoupling, significantly improving the performance. In addition, to flexibly handle modal-incomplete data, we design a parameter-efficient Dynamic Sharing (DS) framework which can adaptively switch on/off the network parameters based on whether one modality is available. We conduct extensive experiments on three popular multimodal benchmarks, including BraTS 2018 for medical segmentation, CMU-MOSI, and CMU-MOSEI for sentiment analysis. The results show that our method can significantly outperform the competitors, showing the effectiveness of the proposed solutions. Our code is released here: https://github.com/HaoWang420/Gradient-guided-Modality-Decoupling.
- SMU-Net: Style matching U-Net for brain tumor segmentation with missing modalities. In Medical Imaging with Deep Learning.
- Domain-specific batch normalization for unsupervised domain adaptation. In Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, 7354–7362.
- RFNet: Region-aware fusion network for incomplete multi-modal brain tumor segmentation. In Proceedings of the IEEE/CVF international conference on computer vision, 3975–3984.
- Learning patterns of the ageing brain in MRI using deep convolutional networks. NeuroImage, 224: 117401.
- Hetero-modal variational encoder-decoder for joint modality completion and segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, 74–82. Springer.
- Knowledge distillation: A survey. International Journal of Computer Vision, 129(6): 1789–1819.
- Modality competition: What makes joint training of multi-modal network fail in deep learning?(provably). In International Conference on Machine Learning, 9226–9259. PMLR.
- Mvae: Multimodal variational autoencoder for fake news detection. In The world wide web conference, 2915–2921.
- GCNet: Graph Completion Network for Incomplete Multimodal Learning in Conversation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45: 8419–8432.
- Action-conditioned On-demand Motion Generation. Proceedings of the 30th ACM International Conference on Multimedia.
- Are Multimodal Transformers Robust to Missing Modality? In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 18177–18186.
- The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS). IEEE Transactions on Medical Imaging, 34(10): 1993–2024.
- Attention bottlenecks for multimodal fusion. Advances in Neural Information Processing Systems, 34: 14200–14213.
- Analyzing the effectiveness and applicability of co-training. In Proceedings of the ninth international conference on Information and knowledge management, 86–93.
- Balanced multimodal learning via on-the-fly gradient modulation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8238–8247.
- Geometric Multimodal Contrastive Representation Learning. In International Conference on Machine Learning, 17782–17800. PMLR.
- Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, 618–626.
- Brain tumor segmentation on MRI with missing modalities. In International Conference on Information Processing in Medical Imaging, 417–428. Springer.
- Variational mixture-of-experts autoencoders for multi-modal deep generative models. Advances in Neural Information Processing Systems, 32.
- Generalized Multimodal ELBO. In International Conference on Learning Representations.
- Multimodal transformer for unaligned multimodal language sequences. In Proceedings of the conference. Association for Computational Linguistics. Meeting, volume 2019, 6558. NIH Public Access.
- Learning Factorized Multimodal Representations. In International Conference on Representation Learning.
- What Makes Training Multi-Modal Classification Networks Hard? In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 12692–12702.
- ACN: Adversarial Co-training Network for Brain Tumor Segmentation with Missing Modalities. arXiv:2106.14591.
- Characterizing and overcoming the greedy nature of learning in multi-modal deep neural networks. In International Conference on Machine Learning, 24043–24055. PMLR.
- Discrepancy and Gradient-Guided Multi-modal Knowledge Distillation for Pathological Glioma Grading. In Wang, L.; Dou, Q.; Fletcher, P. T.; Speidel, S.; and Li, S., eds., Medical Image Computing and Computer Assisted Intervention – MICCAI 2022, 636–646. Cham: Springer Nature Switzerland.
- Gradient surgery for multi-task learning. Advances in Neural Information Processing Systems, 33: 5824–5836.
- Multimodal sentiment intensity analysis in videos: Facial gestures and verbal messages. IEEE Intelligent Systems, 31(6): 82–88.
- Multimodal language analysis in the wild: Cmu-mosei dataset and interpretable dynamic fusion graph. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2236–2246.
- CPM-Nets: Cross Partial Multi-View Networks. In Neural Information Processing Systems.
- Decoding of human identity by computer vision and neuronal vision. Scientific reports, 13(1): 651.
- Characterizing physiological high-frequency oscillations using deep learning. Journal of Neural Engineering, 19.
- Missing Modality Imagination Network for Emotion Recognition with Uncertain Missing Modalities. In Annual Meeting of the Association for Computational Linguistics.
- Conditional generator and multi-sourcecorrelation guided brain tumor segmentation with missing MR modalities. arXiv:2105.13013.
- Latent Correlation Representation Learning for Brain Tumor Segmentation With Missing MRI Modalities. IEEE Transactions on Image Processing, 30: 4263–4274.
- Hao Wang (1120 papers)
- Shengda Luo (7 papers)
- Guosheng Hu (27 papers)
- Jianguo Zhang (97 papers)