A tutorial on multi-view autoencoders using the multi-view-AE library
Abstract: There has been a growing interest in recent years in modelling multiple modalities (or views) of data to for example, understand the relationship between modalities or to generate missing data. Multi-view autoencoders have gained significant traction for their adaptability and versatility in modelling multi-modal data, demonstrating an ability to tailor their approach to suit the characteristics of the data at hand. However, most multi-view autoencoders have inconsistent notation and are often implemented using different coding frameworks. To address this, we present a unified mathematical framework for multi-view autoencoders, consolidating their formulations. Moreover, we offer insights into the motivation and theoretical advantages of each model. To facilitate accessibility and practical use, we extend the documentation and functionality of the previously introduced \texttt{multi-view-AE} library. This library offers Python implementations of numerous multi-view autoencoder models, presented within a user-friendly framework. Through benchmarking experiments, we evaluate our implementations against previous ones, demonstrating comparable or superior performance. This work aims to establish a cohesive foundation for multi-modal modelling, serving as a valuable educational resource in the field.
- Deep canonical correlation analysis. In Sanjoy Dasgupta and David McAllester, editors, Proceedings of the 30th International Conference on Machine Learning, volume 28 of Proceedings of Machine Learning Research, pages 1247–1255, Atlanta, Georgia, USA, 2013. PMLR. URL https://proceedings.mlr.press/v28/andrew13.html.
- Sparse multi-channel variational autoencoder for the joint analysis of heterogeneous data. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 302–311. PMLR, 2019. URL https://proceedings.mlr.press/v97/antelmi19a.html.
- Wasserstein gan. arXiv preprint, arXiv:1701.07875, 2017.
- A probabilistic interpretation of canonical correlation analysis. Technical report, University of California, 05 2005.
- API design for machine learning software: experiences from the scikit-learn project. In ECML PKDD Workshop: Languages for Data Mining and Machine Learning, pages 108–122, 2013.
- Importance weighted autoencoders. arXiv preprint, arXiv:1509.00519, 2016.
- Generalized product of experts for automatic and principled fusion of gaussian process predictions. arXiv preprint, arXiv:1410.7827, 2014. doi:10.48550/ARXIV.1410.7827. URL https://arxiv.org/abs/1410.7827.
- On the limitations of multimodal vaes. arXiv preprint, arXiv:2110.04121, 2021.
- Multi-view representation learning via total correlation objective. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, pages 12194–12207. Curran Associates, Inc., 2021. URL https://proceedings.neurips.cc/paper/2021/file/65a99bb7a3115fdede20da98b08a370f-Paper.pdf.
- Variational dropout and the local reparameterization trick. arXiv preprint, arXiv:1506.02557, 2015.
- Multi-modal variational autoencoders for normative modelling across multiple imaging modalities. In Hayit Greenspan, Anant Madabhushi, Parvin Mousavi, Septimiu Salcudean, James Duncan, Tanveer Syeda-Mahmood, and Russell Taylor, editors, Medical Image Computing and Computer Assisted Intervention – MICCAI 2023, pages 425–434, Cham, 2023a. Springer Nature Switzerland.
- Multi-view-ae: A python package for multi-view autoencoder models. Journal of Open Source Software, 8(85):5093, 2023b. doi:10.21105/joss.05093. URL https://doi.org/10.21105/joss.05093.
- MNIST handwritten digit database. http://yann.lecun.com/exdb/mnist/, 2010. URL http://yann.lecun.com/exdb/mnist/.
- Private-shared disentangled multimodal vae for learning of latent representations. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 1692–1700, 2021. doi:10.1109/CVPRW53098.2021.00185.
- MMVAE+: Enhancing the generative quality of multimodal VAEs without compromises. In ICLR Workshop on Deep Generative Models for Highly Structured Data, 2022. URL https://openreview.net/forum?id=B42UJTVdDZ5.
- Multi-view deep network: A deep model based on learning features from heterogeneous neural networks for sentiment analysis. IEEE Access, 8:86984–86997, 05 2020.
- MultiVae: A Python library for Multimodal Generative Autoencoders. working paper or preprint, September 2023. URL https://hal.science/hal-04207151.
- Multiview Learning in Biomedical Applications, pages 265–280. Academic Press, 2019.
- Variational mixture-of-experts autoencoders for multi-modal deep generative models. arXiv preprint, arXiv:1911.03393, 2019. doi:10.48550/ARXIV.1911.03393. URL https://arxiv.org/abs/1911.03393.
- A multivariate calibration problem in analytical chemistry solved by partial least-squares models in latent variables. Analytica Chimica Acta, 150:61–70, 1983.
- Generalized multimodal elbo. arXiv preprint, arXiv:2105.02470, 2021a.
- Multimodal generative learning utilizing jensen-shannon-divergence. In Advances in Neural Information Processing Systems, volume 33. Curran Associates, Inc., 03 2021b.
- A survey of multimodal deep generative models. Advanced Robotics, 36(5-6):261–278, 2022.
- Joint multimodal learning with deep generative models. arXiv preprint, arXiv:1611.01891, 11 2016.
- Pixyz: a library for developing deep generative models. arXiv preprint, arXiv:2107.13109, 2021.
- On deep multi-view representation learning: Objectives and optimization. arXiv preprint, arXiv:1602.01024, 2016a.
- Deep variational canonical correlation analysis. arXiv preprint, arXiv:1610.03454, 2016b.
- Adversarial correlated autoencoder for unsupervised multi-view representation learning. Knowledge-Based Systems, 168:109–120, 2019.
- Satosi Watanabe. Information theoretical analysis of multivariate correlation. IBM Journal of Research and Development, 4(1):66–82, 1960. doi:10.1147/rd.41.0066.
- Multimodal generative models for scalable weakly-supervised learning. arXiv preprint, arXiv:1802.05335, 2018.
- Omry Yadan. Hydra - a framework for elegantly configuring complex applications. Github, 2019. URL https://github.com/facebookresearch/hydra.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.