Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 78 tok/s
Gemini 2.5 Pro 43 tok/s Pro
GPT-5 Medium 23 tok/s
GPT-5 High 29 tok/s Pro
GPT-4o 93 tok/s
GPT OSS 120B 470 tok/s Pro
Kimi K2 183 tok/s Pro
2000 character limit reached

An Information Criterion for Controlled Disentanglement of Multimodal Data (2410.23996v2)

Published 31 Oct 2024 in cs.LG, cs.AI, cs.IT, and math.IT

Abstract: Multimodal representation learning seeks to relate and decompose information inherent in multiple modalities. By disentangling modality-specific information from information that is shared across modalities, we can improve interpretability and robustness and enable downstream tasks such as the generation of counterfactual outcomes. Separating the two types of information is challenging since they are often deeply entangled in many real-world applications. We propose Disentangled Self-Supervised Learning (DisentangledSSL), a novel self-supervised approach for learning disentangled representations. We present a comprehensive analysis of the optimality of each disentangled representation, particularly focusing on the scenario not covered in prior work where the so-called Minimum Necessary Information (MNI) point is not attainable. We demonstrate that DisentangledSSL successfully learns shared and modality-specific features on multiple synthetic and real-world datasets and consistently outperforms baselines on various downstream tasks, including prediction tasks for vision-language data, as well as molecule-phenotype retrieval tasks for biological data. The code is available at https://github.com/uhlerlab/DisentangledSSL.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (45)
  1. Information dropout: Learning optimal representations through noisy computation. IEEE transactions on pattern analysis and machine intelligence, 40(12):2897–2905, 2018.
  2. Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence, 35(8):1798–1828, 2013.
  3. Multi-level variational autoencoder: Learning disentangled representations from grouped observations. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018.
  4. Functional immune mapping with deep-learning enabled phenomics applied to immunomodulatory and covid-19 drug discovery. Biorxiv, pp.  2020–08, 2020.
  5. Self-supervised disentanglement of modality-specific and shared factors improves multimodal generative models. In Pattern Recognition: 42nd DAGM German Conference, DAGM GCPR 2020, Tübingen, Germany, September 28–October 1, 2020, Proceedings 42, pp.  459–473. Springer, 2021.
  6. Emily L Denton et al. Unsupervised learning of disentangled representations from video. Advances in neural information processing systems, 30, 2017.
  7. Learning robust representations via multi-view information bottleneck. In International Conference on Learning Representations, 2019.
  8. Ian Fischer. The conditional entropy bottleneck. Entropy, 22(9):999, 2020.
  9. An information theoretic tradeoff between complexity and accuracy. In Learning Theory and Kernel Machines: 16th Annual Conference on Learning Theory and 7th Kernel Workshop, COLT/Kernel 2003, Washington, DC, USA, August 24-27, 2003. Proceedings, pp.  595–609. Springer, 2003.
  10. Image-to-image translation for cross-domain disentanglement. Advances in neural information processing systems, 31, 2018.
  11. The platonic representation hypothesis. arXiv preprint arXiv:2405.07987, 2024.
  12. Mol2vec: unsupervised machine learning approach with chemical intuition. Journal of chemical information and modeling, 58(1):27–35, 2018.
  13. Caveats for information bottleneck in deterministic scenarios. arXiv preprint arXiv:1808.07593, 2018.
  14. Compressive visual representations. Advances in Neural Information Processing Systems, 34:19538–19552, 2021.
  15. Private-shared disentangled multimodal vae for learning of latent representations. In Proceedings of the ieee/cvf conference on computer vision and pattern recognition, pp.  1692–1700, 2021.
  16. Disentangled wasserstein autoencoder for t-cell receptor engineering. Advances in Neural Information Processing Systems, 36, 2024.
  17. Multibench: Multiscale benchmarks for multimodal representation learning. arXiv preprint arXiv:2107.07502, 2021.
  18. Foundations and trends in multimodal machine learning: Principles, challenges, and open questions. arXiv preprint arXiv:2209.03430, 2022a.
  19. Factorized contrastive learning: Going beyond multi-view redundancy. Advances in Neural Information Processing Systems, 36, 2024.
  20. Mind the gap: Understanding the modality gap in multi-modal contrastive representation learning. Advances in Neural Information Processing Systems, 35:17612–17625, 2022b.
  21. Learning molecular representation in a cell. arXiv preprint arXiv:2406.12056, 2024a.
  22. Focal: Contrastive learning for multimodal time-series sensing signals in factorized orthogonal latent space. Advances in Neural Information Processing Systems, 36, 2024b.
  23. Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. Advances in neural information processing systems, 32, 2019.
  24. Disentangling factors of variation in deep representation using adversarial training. Advances in neural information processing systems, 29, 2016.
  25. Cellprofiler 3.0: Next-generation image processing for biology. PLoS biology, 16(7):e2005970, 2018.
  26. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
  27. Disentangled information bottleneck. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pp.  9285–9293, 2021.
  28. Learning transferable visual models from natural language supervision. In International conference on machine learning, pp.  8748–8763. PMLR, 2021.
  29. Accept the modality gap: An exploration in the hyperbolic space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  27263–27272, 2024.
  30. The convex information bottleneck lagrangian. Entropy, 22(1):98, 2020.
  31. Learning disentangled representations via mutual information estimation. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXII 16, pp.  205–221. Springer, 2020.
  32. To compress or not to compress–self-supervised learning and information theory: A review. arXiv preprint arXiv:2304.09355, 2023.
  33. Opening the black box of deep neural networks via information. arXiv preprint arXiv:1703.00810, 2017.
  34. An information theoretic framework for multi-view learning. In COLT, pp.  403–414, 2008.
  35. A next generation connectivity map: L1000 platform and the first 1,000,000 profiles. Cell, 171(6):1437–1452, 2017.
  36. The information bottleneck method. arXiv preprint physics/0004057, 2000.
  37. Contrastive learning, multi-view redundancy, and linear models. In Algorithmic Learning Theory, pp.  1179–1206. PMLR, 2021.
  38. Self-supervised learning from a multi-view perspective. In International Conference on Learning Representations, 2020.
  39. Removing biases from molecular representations via information maximization. In The Twelfth International Conference on Learning Representations, 2023a.
  40. Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In International conference on machine learning, pp.  9929–9939. PMLR, 2020.
  41. Visual explanations of image-text representations via multi-modal information bottleneck attribution. In Thirty-seventh Conference on Neural Information Processing Systems, 2023b.
  42. Multi-domain translation between single-cell imaging and sequencing data using autoencoders. Nature Communications, 12(1):31, 2021.
  43. Multimodal contrastive training for visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  6995–7004, 2021.
  44. Graph-based autoencoder integrates spatial transcriptomics with chromatin images and identifies joint biomarkers for Alzheimer’s disease. Nature Communications, 13(1):7480, 2022.
  45. Partially shared multi-modal embedding learns holistic representation of cell state. Under revision, preprint available at https://biorxiv.org/cgi/content/short/2024.10.01.615977v1, 2024.
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Paper Prompts

Sign up for free to create and run prompts on this paper using GPT-5.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com