Learning multi-modal generative models with permutation-invariant encoders and tighter variational objectives (2309.00380v3)
Abstract: Devising deep latent variable models for multi-modal data has been a long-standing theme in machine learning research. Multi-modal Variational Autoencoders (VAEs) have been a popular generative model class that learns latent representations that jointly explain multiple modalities. Various objective functions for such models have been suggested, often motivated as lower bounds on the multi-modal data log-likelihood or from information-theoretic considerations. To encode latent variables from different modality subsets, Product-of-Experts (PoE) or Mixture-of-Experts (MoE) aggregation schemes have been routinely used and shown to yield different trade-offs, for instance, regarding their generative quality or consistency across multiple modalities. In this work, we consider a variational objective that can tightly approximate the data log-likelihood. We develop more flexible aggregation schemes that avoid the inductive biases in PoE or MoE approaches by combining encoded features from different modalities based on permutation-invariant neural networks. Our numerical experiments illustrate trade-offs for multi-modal variational objectives and various aggregation schemes. We show that our variational objective and more flexible aggregation models can become beneficial when one wants to approximate the true joint distribution over observed modalities and latent variables in identifiable models.
- Ali E Abbas. A Kullback-Leibler view of linear and log-linear pools. Decision Analysis, 6(1):25–37, 2009.
- Shotaro Akaho. A kernel method for Canonical Correlation Analysis. In International Meeting of Psychometric Society, 2001, 2001.
- Fixing a broken ELBO. In International conference on machine learning, pages 159–168. PMLR, 2018.
- Deep Variational Information Bottleneck. arXiv preprint arXiv:1612.00410, 2016.
- Identifiability of parameters in latent structure models with many observed variables. The Annals of Statistics, 37(6A):3099–3132, 2009.
- Sparse probabilistic projections. Advances in neural information processing systems, 21, 2008.
- Multi-Omics Factor Analysis—a framework for unsupervised integration of multi-omics data sets. Molecular systems biology, 14(6):e8124, 2018.
- Layer normalization. arXiv preprint arXiv:1607.06450, 2016.
- A Probabilistic Interpretation of Canonical Correlation Analysis. 2005.
- Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014.
- One transformer fits all distributions in multi-modal diffusion at scale. In International Conference on Machine Learning, pages 1692–1717. PMLR, 2023.
- The IM Algorithm: a variational approach to Information Maximization. Advances in neural information processing systems, 16(320):201, 2004.
- Equilibrium aggregation: Encoding sets via optimization. In Uncertainty in Artificial Intelligence, pages 139–149. PMLR, 2022.
- Scalable normalizing flows for permutation invariant densities. In International Conference on Machine Learning, pages 957–967. PMLR, 2021.
- Variational inference: A review for statisticians. Journal of the American Statistical Association, 112(518):859–877, 2017.
- Probabilistic symmetries and invariant neural networks. J. Mach. Learn. Res., 21:90–1, 2020.
- Multi-modal latent diffusion. arXiv preprint arXiv:2306.04445, 2023.
- JAX: composable transformations of Python+NumPy programs, 2018. URL http://github.com/google/jax.
- MW Browne. Factor analysis of multiple batteries by maximum likelihood. British Journal of Mathematical and Statistical Psychology, 1980.
- Mini-batch consistent slot set encoder for scalable set encoding. Advances in Neural Information Processing Systems, 34:21365–21374, 2021.
- Shuhao Cao. Choose a transformer: Fourier or Galerkin. Advances in neural information processing systems, 34:24924–24940, 2021.
- The sample size required in importance sampling. The Annals of Applied Probability, 28(2):1099–1135, 2018.
- Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks. In International conference on machine learning, pages 794–803. PMLR, 2018.
- A recurrent latent variable model for sequential data. In Advances in neural information processing systems, pages 2980–2988, 2015.
- Diagnosing and enhancing vae models. In International Conference on Learning Representations, 2018.
- Connections with robust PCA and the role of emergent sparsity in variational autoencoder models. The Journal of Machine Learning Research, 19(1):1573–1614, 2018.
- On the Limitations of Multimodal VAEs. In International Conference on Learning Representations, 2022.
- Identifiability results for multimodal contrastive learning. arXiv preprint arXiv:2303.09166, 2023.
- Finite exchangeable sequences. The Annals of Probability, pages 745–764, 1980.
- Deep unsupervised clustering with Gaussian Mixture Variational Autoencoders. arXiv preprint arXiv:1611.02648, 2016.
- Neural diffusion processes. In International Conference on Machine Learning, pages 8990–9012. PMLR, 2023.
- Towards a neural statistician. arXiv preprint arXiv:1606.02185, 2016.
- Multi-facet clustering Variational Autoencoders. Advances in Neural Information Processing Systems, 34:8676–8690, 2021.
- Implicit reparameterization gradients. In Advances in Neural Information Processing Systems, pages 441–452, 2018.
- Steepest descent methods for multicriteria optimization. Mathematical methods of operations research, 51:479–494, 2000.
- Meta-learning stationary stochastic process prediction with convolutional neural processes. Advances in Neural Information Processing Systems, 33:8284–8295, 2020.
- Auto-encoding total correlation explanation. In The 22nd International Conference on Artificial Intelligence and Statistics, pages 1157–1166. PMLR, 2019.
- Conditional neural processes. In International conference on machine learning, pages 1704–1713. PMLR, 2018a.
- Neural processes. arXiv preprint arXiv:1807.01622, 2018b.
- Combining probability distributions: A critique and an annotated bibliography. Statistical Science, 1(1):114–135, 1986.
- Characterization of externally Bayesian pooling operators. The Annals of Statistics, pages 487–501, 1986.
- Deep generative pattern-set mixture models for nonignorable missingness. arXiv preprint arXiv:2103.03532, 2021.
- Scha-vae: Hierarchical context aggregation for few-shot generation. In International Conference on Machine Learning, pages 7550–7569. PMLR, 2022.
- Variational selective autoencoder: Learning from partially-observed heterogeneous data. In International Conference on Artificial Intelligence and Statistics, pages 2377–2385. PMLR, 2021.
- A Kernel Method for the Two-Sample-Problem. Advances in neural information processing systems, 19, 2006.
- Hidden markov nonlinear ica: Unsupervised learning from nonstationary time series. In Conference on Uncertainty in Artificial Intelligence, pages 939–948. PMLR, 2020.
- Disentangling identifiable features from noisy data with structured nonlinear ICA. Advances in Neural Information Processing Systems, 34:1624–1633, 2021.
- Canonical correlation analysis: An overview with application to learning methods. Neural computation, 16(12):2639–2664, 2004.
- Identity mappings in deep residual networks. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV 14, pages 630–645. Springer, 2016.
- Flax: A neural network library and ecosystem for JAX, 2023. URL http://github.com/google/flax.
- The variational homoencoder: Learning to learn high capacity generative models from few examples. arXiv preprint arXiv:1807.08919, 2018.
- β𝛽\betaitalic_β-VAE: Learning basic visual concepts with a constrained variational framework. In International conference on learning representations, 2017.
- ELBO surgery: yet another way to carve up the variational evidence lower bound. In Workshop in Advances in Approximate Bayesian Inference, NIPS, 2016.
- Equivariant learning of stochastic fields: Gaussian processes and steerable conditional neural processes. In International Conference on Machine Learning, pages 4297–4307. PMLR, 2021.
- Autoregressive Diffusion Models. In International Conference on Learning Representations, 2021.
- Harold Hotelling. Relations between two sets of variates. Biometrika, 28(3/4):321–377, 1936.
- Evaluating lossy compression rates of deep generative models. arXiv preprint arXiv:2008.06653, 2020.
- Modality competition: What makes joint training of multi-modal network fail in deep learning?(provably). arXiv preprint arXiv:2203.12221, 2022.
- Multi-view representation learning via total correlation objective. Advances in Neural Information Processing Systems, 34:12194–12207, 2021.
- Unsupervised feature extraction by time-contrastive learning and nonlinear ICA. Advances in neural information processing systems, 29, 2016.
- Nonlinear Independent Component Analysis: Existence and uniqueness results. Neural networks, 12(3):429–439, 1999.
- not-MIWAE: Deep Generative Modelling with Missing not at Random Data. In ICLR 2021-International Conference on Learning Representations, 2021.
- Categorical reparameterization with gumbel-softmax. arXiv preprint arXiv:1611.01144, 2016.
- Mitigating Modality Collapse in Multimodal VAEs via Impartial Optimization. arXiv preprint arXiv:2206.04496, 2022.
- Variational deep embedding: an unsupervised and generative approach to clustering. In Proceedings of the 26th International Joint Conference on Artificial Intelligence, pages 1965–1972, 2017.
- Structured vaes: Composing probabilistic graphical models and variational autoencoders. arXiv preprint arXiv:1603.06277, 2016.
- An introduction to variational methods for graphical models. Machine learning, 37(2):183–233, 1999.
- Learning multimodal VAEs through mutual supervision. arXiv preprint arXiv:2106.12570, 2021.
- Deep probabilistic canonical correlation analysis. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 8055–8063, 2021.
- Variational Autoencoders and nonlinear ICA: A unifying framework. In International Conference on Artificial Intelligence and Statistics, pages 2207–2217. PMLR, 2020a.
- ICE-BeeM: Identifiable Conditional Energy-Based Deep Models Based on Nonlinear ICA. Advances in Neural Information Processing Systems, 33:12768–12778, 2020b.
- Attentive neural processes. In International Conference on Learning Representations, 2018.
- Setvae: Learning hierarchical composition for generative modeling of set-structured data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15059–15068, 2021.
- Covariate-informed Representation Learning to Prevent Posterior Collapse of iVAE. In International Conference on Artificial Intelligence and Statistics, pages 2641–2660. PMLR, 2023.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Identifiability of deep generative models without auxiliary information. In Advances in Neural Information Processing Systems, 2022.
- Bayesian canonical correlation analysis. Journal of Machine Learning Research, 14(4), 2013.
- Buy 4 reinforce samples, get a baseline for free! 2019.
- The identification of structural characteristics. The Annals of Mathematical Statistics, 21(2):165–181, 1950.
- Reconstructing nonlinear dynamical systems from multi-modal time series. In International Conference on Machine Learning, pages 11613–11633. PMLR, 2022.
- Joseph B Kruskal. More factors than subjects, tests and treatments: An indeterminacy theorem for canonical decomposition and individual differences scaling. Psychometrika, 41(3):281–293, 1976.
- Empirical evaluation of neural process objectives. In NeurIPS workshop on Bayesian Deep Learning, volume 4, 2018.
- Changhee Lee and Mihaela van der Schaar. A variational information bottleneck approach to multi-omics data integration. In International Conference on Artificial Intelligence and Statistics, pages 1513–1521. PMLR, 2021.
- Set Transformer: A framework for attention-based permutation-invariant neural networks. In International conference on machine learning, pages 3744–3753. PMLR, 2019.
- Private-shared disentangled multimodal vae for learning of latent representations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1692–1700, 2021.
- Point cloud GAN. arXiv preprint arXiv:1810.05795, 2018.
- Deep neural network approximation of invariant functions through dynamical systems. arXiv preprint arXiv:2208.08707, 2022.
- Partially observed exchangeable modeling. In International Conference on Machine Learning, pages 6460–6470. PMLR, 2021.
- Exchangeable neural ode for set modeling. Advances in Neural Information Processing Systems, 33:6936–6946, 2020.
- Ralph Linsker. Self-organization in a perceptual network. Computer, 21(3):105–117, 1988.
- Diagnosing and Fixing Manifold Overfitting in Deep Generative Models. Transactions on Machine Learning Research, 2022.
- Invariant causal representation learning for out-of-distribution generalization. In International Conference on Learning Representations, 2022.
- Don’t Blame the ELBO! A Linear VAE Perspective on Posterior Collapse. In Advances in Neural Information Processing Systems, pages 9408–9418, 2019.
- Qi Lyu and Xiao Fu. Finite-sample analysis of deep CCA-based unsupervised post-nonlinear multimodal learning. IEEE Transactions on Neural Networks and Learning Systems, 2022.
- Understanding latent correlation-based multiview learning and self-supervision: An identifiability perspective. arXiv preprint arXiv:2106.07115, 2021.
- EDDI: Efficient Dynamic Discovery of High-Value Information with Partial VAE. In International Conference on Machine Learning, pages 4234–4243. PMLR, 2019.
- The concrete distribution: A continuous relaxation of discrete random variables. arXiv preprint arXiv:1611.00712, 2016.
- Adversarial Autoencoders. In ICLR, 2016.
- On the universality of invariant networks. In International conference on machine learning, pages 4363–4371. PMLR, 2019.
- Disentangling disentanglement in Variational Autoencoders. In International Conference on Machine Learning, pages 4402–4412. PMLR, 2019.
- Geometric Neural Diffusion Processes. Advances in Neural Information Processing Systems, 37, 2023.
- A mixture-of-experts deep generative model for integrated analysis of single-cell multiomics data. Cell reports methods, 1(5):100071, 2021.
- An identifiable double VAE for disentangled representations. In International Conference on Machine Learning, pages 7769–7779. PMLR, 2021.
- Identifiable deep generative models via sparse decoding. arXiv preprint arXiv:2110.10804, 2021.
- Automatic differentiation variational inference with mixtures. In International Conference on Artificial Intelligence and Statistics, pages 3250–3258. PMLR, 2021.
- Janossy pooling: Learning deep permutation-invariant functions for variable-size inputs. In International Conference on Learning Representations (ICLR 2019), 2019.
- Handling incomplete heterogeneous data using VAEs. Pattern Recognition, 107:107501, 2020.
- Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
- Mmvae+: Enhancing the generative quality of multimodal vaes without compromises. In The Eleventh International Conference on Learning Representations, 2023.
- Deep Generative Clustering with Multimodal Diffusion Variational Autoencoders. In The Twelfth International Conference on Learning Representations, 2024.
- On variational bounds of mutual information. In International Conference on Machine Learning, pages 5171–5180. PMLR, 2019.
- Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 652–660, 2017.
- Black box variational inference. In AISTATS, pages 814–822, 2014.
- Stochastic backpropagation and approximate inference in deep generative models. In Proceedings of the 31st International Conference on Machine Learning (ICML-14), pages 1278–1286, 2014.
- Herbert E Robbins. An empirical Bayes approach to statistics. In Breakthroughs in Statistics: Foundations and basic theory, pages 388–394. Springer, 1992.
- Sticking the landing: An asymptotically zero-variance gradient estimator for variational inference. arXiv preprint arXiv:1703.09194, 2017.
- Variational Autoencoders pursue PCA directions (by accident). In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12406–12415, 2019.
- Distribution matching in variational inference. arXiv preprint arXiv:1802.06847, 2018.
- Nonlinear dimensionality reduction by locally linear embedding. science, 290(5500):2323–2326, 2000.
- Donald B Rubin. Inference and missing data. Biometrika, 63(3):581–592, 1976.
- Universal approximations of permutation invariant/equivariant functions by deep neural networks. arXiv preprint arXiv:1903.01939, 2019.
- A simple neural network module for relational reasoning. Advances in neural information processing systems, 30, 2017.
- Learnable latent embeddings for joint behavioural and neural analysis. Nature, pages 1–9, 2023.
- On universal equivariant set networks. In International Conference on Learning Representations, 2019.
- Multi-task learning as multi-objective optimization. Advances in neural information processing systems, 31, 2018.
- Variational Mixture-of-Experts Autoencoders for Multi-Modal Deep Generative Models. Advances in Neural Information Processing Systems, 32, 2019.
- Relating by Contrasting: A Data-efficient Framework for Multimodal Generative Models. In International Conference on Learning Representations, 2020.
- Training and inference on any-order autoregressive models the right way. Advances in Neural Information Processing Systems, 35:2762–2775, 2022.
- Disentanglement by nonlinear ICA with general incompressible-flow networks (GIN). arXiv preprint arXiv:2001.04872, 2020.
- Forecasting using principal components from a large number of predictors. Journal of the American statistical association, 97(460):1167–1179, 2002.
- Multimodal generative learning utilizing Jensen-Shannon-divergence. Advances in Neural Information Processing Systems, 33:6100–6110, 2020.
- Generalized multimodal elbo. In 9th International Conference on Learning Representations (ICLR 2021), 2021.
- Mitigating the Limitations of Multimodal VAEs with Coordination-based Approach. 2022.
- Joint multimodal learning with deep generative models. arXiv preprint arXiv:1611.01891, 2016.
- The sensory neuron as a transformer: Permutation-invariant neural networks for reinforcement learning. Advances in Neural Information Processing Systems, 34:22574–22587, 2021.
- Regularized generalized Canonical Correlation Analysis. Psychometrika, 76:257–284, 2011.
- Contrastive multiview coding. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XI 16, pages 776–794. Springer, 2020.
- Probabilistic Principal Component Analysis. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 61(3):611–622, 1999.
- Doubly stochastic variational bayes for non-conjugate inference. In Proceedings of the 31st International Conference on Machine Learning (ICML-14), pages 1971–1979, 2014.
- Vae with a vampprior. arXiv preprint arXiv:1705.07120, 2017.
- Multimodal transformer for unaligned multimodal language sequences. In Proceedings of the conference. Association for Computational Linguistics. Meeting, volume 2019, page 6558. NIH Public Access, 2019.
- Score-based generative modeling in latent space. Advances in Neural Information Processing Systems, 34, 2021.
- Leveraging hierarchy in multimodal generative models for effective cross-modality inference. Neural Networks, 146:238–255, 2022.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Generative models of visually grounded imagination. In International Conference on Learning Representations, 2018.
- Maximally informative hierarchical representations of high-dimensional data. In Artificial Intelligence and Statistics, pages 1004–1012. PMLR, 2015.
- Bayesian group factor analysis. In Artificial Intelligence and Statistics, pages 1269–1277. PMLR, 2012.
- Universal approximation of functions on sets. Journal of Machine Learning Research, 23(151):1–56, 2022.
- Qi Wang and Herke Van Hoof. Doubly stochastic variational inference for neural processes with hierarchical latent variables. In International Conference on Machine Learning, pages 10018–10028. PMLR, 2020.
- Learning deep transformer models for machine translation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1810–1822, 2019a.
- Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In International Conference on Machine Learning, pages 9929–9939. PMLR, 2020.
- On deep multi-view representation learning. In International conference on machine learning, pages 1083–1092. PMLR, 2015.
- Deep Variational Canonical Correlation Analysis. arXiv preprint arXiv:1610.03454, 2016.
- What makes training multi-modal classification networks hard? In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12695–12705, 2020.
- Comment: Variational Autoencoders as Empirical Bayes. 2019b.
- Posterior collapse and latent variable non-identifiability. Advances in Neural Information Processing Systems, 34:5443–5455, 2021.
- Satosi Watanabe. Information theoretical analysis of multivariate correlation. IBM Journal of research and development, 4(1):66–82, 1960.
- Transformers are deep infinite-dimensional non-mercer binary kernel machines. arXiv preprint arXiv:2106.01506, 2021.
- Multimodal generative models for scalable weakly-supervised learning. Advances in Neural Information Processing Systems, 31, 2018.
- Multimodal generative models for compositional representation learning. arXiv preprint arXiv:1912.05075, 2019.
- Indeterminacy in latent variable models: Characterization and strong identifiability. arXiv preprint arXiv:2206.00801, 2022.
- On layer normalization in the transformer architecture. In International Conference on Machine Learning, pages 10524–10533. PMLR, 2020.
- Deep Stochastic Processes via Functional Markov Transition Operators. Advances in Neural Information Processing Systems, 37, 2023.
- Multimodal learning with transformers: A survey. arXiv preprint arXiv:2206.06488, 2022.
- SE (3) diffusion model with application to protein backbone generation. In International Conference on Machine Learning, pages 40001–40039. PMLR, 2023.
- Gradient surgery for multi-task learning. Advances in Neural Information Processing Systems, 33:5824–5836, 2020.
- Are transformers universal approximators of sequence-to-sequence functions? In International Conference on Learning Representations, 2019.
- Deep Sets. Advances in neural information processing systems, 30, 2017.
- CPM-Nets: Cross partial multi-view networks. Advances in Neural Information Processing Systems, 32, 2019.
- Relational Reasoning via Set Transformers: Provable Efficiency and Applications to MARL. arXiv preprint arXiv:2209.09845, 2022a.
- Set Norm and Equivariant Skip Connections: Putting the Deep in Deep Sets. In International Conference on Machine Learning, pages 26559–26574. PMLR, 2022b.
- Contrastive learning of medical visual representations from paired images and text. In Machine Learning for Healthcare Conference, pages 2–25. PMLR, 2022c.
- InfovVAE: Balancing Learning and Inference in Variational Autoencoders. In Proceedings of the aaai conference on artificial intelligence, volume 33, pages 5885–5892, 2019.
- Bayesian group factor analysis with structured sparsity. The Journal of Machine Learning Research, 2016.
- Learning manifold dimensions with conditional Variational Autoencoders. Advances in Neural Information Processing Systems, 35:34709–34721, 2022.
- Learning identifiable and interpretable latent models of high-dimensional neural activity using pi-VAE. Advances in Neural Information Processing Systems, 33:7234–7247, 2020.
- Marcel Hirt (7 papers)
- Domenico Campolo (10 papers)
- Victoria Leong (2 papers)
- Juan-Pablo Ortega (43 papers)