Seeking the Sufficiency and Necessity Causal Features in Multimodal Representation Learning (2408.16577v2)
Abstract: Probability of necessity and sufficiency (PNS) measures the likelihood of a feature set being both necessary and sufficient for predicting an outcome. It has proven effective in guiding representation learning for unimodal data, enhancing both predictive performance and model robustness. Despite these benefits, extending PNS to multimodal settings remains unexplored. This extension presents unique challenges, as the conditions for PNS estimation, exogeneity and monotonicity, need to be reconsidered in a multimodal context. We address these challenges by first conceptualizing multimodal representations as comprising modality-invariant and modality-specific components. We then analyze how to compute PNS for each component while ensuring non-trivial PNS estimation. Based on these analyses, we formulate tractable optimization objectives that enable multimodal models to learn high-PNS representations. Experiments demonstrate the effectiveness of our method on both synthetic and real-world data.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.