Quantifying & Modeling Multimodal Interactions: An Information Decomposition Framework (2302.12247v5)
Abstract: The recent explosion of interest in multimodal applications has resulted in a wide selection of datasets and methods for representing and integrating information from different modalities. Despite these empirical advances, there remain fundamental research questions: How can we quantify the interactions that are necessary to solve a multimodal task? Subsequently, what are the most suitable multimodal models to capture these interactions? To answer these questions, we propose an information-theoretic approach to quantify the degree of redundancy, uniqueness, and synergy relating input modalities with an output task. We term these three measures as the PID statistics of a multimodal distribution (or PID for short), and introduce two new estimators for these PID statistics that scale to high-dimensional distributions. To validate PID estimation, we conduct extensive experiments on both synthetic datasets where the PID is known and on large-scale multimodal benchmarks where PID estimations are compared with human annotations. Finally, we demonstrate their usefulness in (1) quantifying interactions within multimodal datasets, (2) quantifying interactions captured by multimodal models, (3) principled approaches for model selection, and (4) three real-world case studies engaging with domain experts in pathology, mood prediction, and robotic perception where our framework helps to recommend strong multimodal models for each application.
- A rewriting system for convex optimization problems. Journal of Control and Decision, 5(1):42–60, 2018.
- Paul D Allison. Testing for interaction in multiple regression. American journal of sociology, 1977.
- Brandon Amos. Tutorial on amortized optimization for learning to optimize over continuous domains. arXiv preprint arXiv:2202.00665, 2022.
- Deep canonical correlation analysis. In International conference on machine learning, pages 1247–1255. PMLR, 2013.
- Vqa: Visual question answering. In Proceedings of the IEEE international conference on computer vision, pages 2425–2433, 2015.
- MOSEK ApS. MOSEK Optimizer API for Python 10.0.34, 2022.
- Comparison of redundancy and relevance measures for feature selection in tissue classification of ct images. In Industrial conference on data mining, pages 248–262. Springer, 2010.
- Co-training and expansion: Towards bridging theory and practice. Advances in neural information processing systems, 17, 2004.
- Multimodal machine learning: A survey and taxonomy. IEEE transactions on pattern analysis and machine intelligence, 2018.
- The moderator–mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations. Journal of personality and social psychology, 51(6):1173, 1986.
- Nonparametric entropy estimation: An overview. International Journal of Mathematical and Statistical Sciences, 1997.
- Anthony J Bell. The co-information lattice. In Proceedings of the fifth international workshop on independent component analysis and blind signal separation: ICA, volume 2003, 2003.
- Quantifying unique information. Entropy, 16(4):2161–2183, 2014.
- Combining labeled and unlabeled data with co-training. In Proceedings of the eleventh annual conference on Computational learning theory, pages 92–100, 1998.
- The somatic genomic landscape of glioblastoma. cell, 155(2):462–477, 2013.
- Current and future strategies for treatment of glioma. Neurosurgical review, 40(1):1–14, 2017.
- Towards multimodal sarcasm detection (an _obviously_ perfect paper). In ACL, 2019.
- CDC. CDC WONDER: Underlying cause of death, 1999–2019. 2020.
- Group redundancy measures reveal redundancy reduction in the auditory pathway. Advances in neural information processing systems, 14, 2001.
- Pathomic fusion: an integrated framework for fusing histopathology and genomic features for cancer diagnosis and prognosis. IEEE Transactions on Medical Imaging, 2020.
- Multimodal co-attention transformer for survival prediction in gigapixel whole slide images. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4015–4025, 2021.
- Pan-cancer integrative histology-genomic analysis via multimodal deep learning. Cancer Cell, 40(8):865–878, 2022.
- Multi-view learning in the presence of view disagreement. In Proceedings of the Twenty-Fourth Conference on Uncertainty in Artificial Intelligence, pages 88–96, 2008.
- Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. Advances in neural information processing systems, 26, 2013.
- CVXPY: A Python-embedded modeling language for convex optimization. Journal of Machine Learning Research, 17(83):1–5, 2016.
- Cooperative learning for multiview analysis. Proceedings of the National Academy of Sciences, 2022.
- Ecos: An socp solver for embedded systems. In 2013 European Control Conference (ECC), pages 3071–3076. IEEE, 2013.
- An information-theoretic quantification of discrimination with exempt features. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 3825–3833, 2020.
- Relations between entropy and error probability. IEEE Transactions on Information theory, 40(1):259–266, 1994.
- The development of infant discrimination of affect in multimodal and unimodal stimulation: The role of intersensory redundancy. Developmental psychology, 43(1):238, 2007.
- Risk factors for suicidal thoughts and behaviors: a meta-analysis of 50 years of research. Psychological bulletin, 2017.
- Predictive learning via rule ensembles. The annals of applied statistics, 2(3):916–954, 2008.
- Multimodal compact bilinear pooling for visual question answering and visual grounding. In Conference on Empirical Methods in Natural Language Processing, pages 457–468. ACL, 2016.
- Wendell R Garner. Uncertainty and structure as psychological concepts. 1962.
- How independent are the messages carried by adjacent inferior temporal cortical neurons? Journal of Neuroscience, 13(7):2758–2771, 1993.
- Interaction information for causal inference: The case of directed triangle. In IEEE International Symposium on Information Theory (ISIT), 2017.
- Improving the short-term prediction of suicidal behavior. American journal of preventive medicine, 2014.
- Approximate inference using conditional entropy decompositions. In Artificial Intelligence and Statistics, pages 131–138. PMLR, 2007.
- Making the v in vqa matter: Elevating the role of image understanding in visual question answering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6904–6913, 2017.
- Disciplined convex programming. In Global optimization, pages 155–210. Springer, 2006.
- Quantifying synergistic mutual information. In Guided self-organization: inception, pages 159–190. Springer, 2014.
- Demystifying local and global fairness trade-offs in federated learning using information theory. In Federated Learning and Analytics in Practice: Algorithms, Systems, Applications, and Opportunities, 2023.
- Ur-funny: A multimodal language dataset for understanding humor. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 2046–2056, 2019.
- Generalized additive models: some applications. Journal of the American Statistical Association, 82(398):371–386, 1987.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
- Does my multimodal model learn cross-modal interactions? it’s harder to tell than you might think! In EMNLP, 2020.
- Deep multimodal multilinear fusion with high-order polynomial pooling. Advances in Neural Information Processing Systems, 32:12136–12145, 2019.
- Attention-based deep multiple instance learning. In International conference on machine learning, pages 2127–2136. PMLR, 2018.
- Feature synergy, redundancy, and independence in global model explanations using shap vector decomposition. arXiv preprint arXiv:2107.12436, 2021.
- Interaction effects in multiple regression. Number 72. sage, 2003.
- Quantifying and visualizing attribute interactions: An approach based on entropy. 2003.
- Multiplicative interactions and where to find them. In International Conference on Learning Representations, 2020.
- Clevr: A diagnostic dataset for compositional language and elementary visual reasoning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2901–2910, 2017.
- Vilt: Vision-and-language transformer without convolution or region supervision. In International Conference on Machine Learning, pages 5583–5594. PMLR, 2021.
- Self-normalizing neural networks. Advances in neural information processing systems, 30, 2017.
- Redundant information neural estimation. Entropy, 23(7):922, 2021.
- Gacs-korner common information variational autoencoder. arXiv preprint arXiv:2205.12239, 2022.
- In-patient suicide: selection of people at risk, failure of protection and the possibility of causation. BJPsych Open, 3(3):102–105, 2017.
- Multimodal sensor fusion with differentiable filters. IROS, 2020.
- Learning representations from imperfect time series data via tensor rank regularization. In ACL, 2019.
- Learning language and multimodal privacy-preserving markers of mood from mobile data. In ACL, 2021a.
- Multibench: Multiscale benchmarks for multimodal representation learning. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1), 2021b.
- Multiviz: An analysis benchmark for visualizing and understanding multimodal models. arXiv preprint arXiv:2207.00056, 2022a.
- Foundations and recent trends in multimodal machine learning: Principles, challenges, and open questions. arXiv preprint arXiv:2209.03430, 2022b.
- Artificial intelligence for multimodal data integration in oncology. Cancer Cell, 40(10):1095–1110, 2022.
- Roberta: A robustly optimized bert pretraining approach, 2019.
- Efficient low-rank multimodal fusion with modality-specific factors. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2247–2256, 2018.
- The 2016 world health organization classification of tumors of the central nervous system: a summary. Acta neuropathologica, 131(6):803–820, 2016.
- A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017.
- Dime: Fine-grained interpretations of multimodal models via disentangled local explanations. AIES, 2022.
- Bivariate partial information decomposition: The optimization perspective. Entropy, 19(10):530, 2017.
- Broja-2pid: A robust estimator for bivariate partial information decomposition. Entropy, 20(4):271, 2018.
- Semi-supervised aggregation of dependent weak supervision sources with performance guarantees. In Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, 2021.
- William McGill. Multivariate information transmission. Transactions of the IRE Professional Group on Information Theory, 4(4):93–111, 1954.
- Predicting cancer outcomes from histology and genomics using convolutional networks. Proceedings of the National Academy of Sciences, 2018.
- Just-in-time adaptive interventions (jitais) in mobile health: key components and design principles for ongoing health behavior support. Annals of Behavioral Medicine, 2018.
- Cancer Genome Atlas Research Network. Comprehensive, integrative genomic analysis of diffuse lower-grade gliomas. New England Journal of Medicine, 372(26):2481–2498, 2015.
- Retinal ganglion cells act largely as independent encoders. Nature, 411(6838):698–701, 2001.
- Conic optimization via operator splitting and homogeneous self-dual embedding. Journal of Optimization Theory and Applications, 2016.
- Estimating the unique information of continuous variables. Advances in Neural Information Processing Systems, 34:20295–20307, 2021.
- Liam Paninski. Estimation of entropy and mutual information. Neural computation, 2003.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.