Concurrent Density Estimation with Wasserstein Autoencoders: Some Statistical Insights (2312.06591v1)
Abstract: Variational Autoencoders (VAEs) have been a pioneering force in the realm of deep generative models. Amongst its legions of progenies, Wasserstein Autoencoders (WAEs) stand out in particular due to the dual offering of heightened generative quality and a strong theoretical backbone. WAEs consist of an encoding and a decoding network forming a bottleneck with the prime objective of generating new samples resembling the ones it was catered to. In the process, they aim to achieve a target latent representation of the encoded data. Our work is an attempt to offer a theoretical understanding of the machinery behind WAEs. From a statistical viewpoint, we pose the problem as concurrent density estimation tasks based on neural network-induced transformations. This allows us to establish deterministic upper bounds on the realized errors WAEs commit. We also analyze the propagation of these stochastic errors in the presence of adversaries. As a result, both the large sample properties of the reconstructed distribution and the resilience of WAE models are explored.
- D. P. Kingma, M. Welling, Auto-encoding variational bayes, in: 2nd International Conference on Learning Representations, ICLR 2014, 2014.
- Variations in variational autoencoders - a comparative evaluation, IEEE Access 8 (2020).
- Wasserstein auto-encoders, in: International Conference on Learning Representations, 2018.
- Representation learning: A review and new perspectives, IEEE transactions on pattern analysis and machine intelligence 35 (2013) 1798–1828.
- Generalization and equilibrium in generative adversarial nets (gans), in: International Conference on Machine Learning, PMLR, 2017, pp. 224–232.
- Approximation and convergence properties of generative adversarial learning, in: Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, 2017, p. 5551–5559.
- Some theoretical properties of GANS, The Annals of Statistics 48 (2020) 1539 – 1566.
- Rates of convergence for density estimation with gans, arXiv preprint arXiv:2102.00199 (2021).
- Some theoretical insights into wasserstein gans, Journal of Machine Learning Research 22 (2021) 1–45.
- Non-asymptotic error bounds for bidirectional GANs, in: A. Beygelzimer, Y. Dauphin, P. Liang, J. W. Vaughan (Eds.), Advances in Neural Information Processing Systems, 2021.
- Adversarial autoencoders, arXiv preprint arXiv:1511.05644 (2015).
- Adversarial variational bayes: Unifying variational autoencoders and generative adversarial networks, in: Proceedings of the 34th International Conference on Machine Learning - Volume 70, ICML’17, 2017, p. 2391–2400.
- On unifying deep generative models, in: International Conference on Learning Representations, 2018.
- A primal-dual link between gans and autoencoders, Advances in Neural Information Processing Systems 32 (2019).
- Robust principal component analysis?, Journal of the ACM (JACM) 58 (2011) 1–37.
- Connections with robust pca and the role of emergent sparsity in variational autoencoder models, The Journal of Machine Learning Research 19 (2018) 1573–1614.
- B. Dai, D. Wipf, Diagnosing and enhancing VAE models, in: International Conference on Learning Representations, 2019.
- A. Chakrabarty, S. Das, Statistical regeneration guarantees of the wasserstein autoencoder with latent space consistency, Advances in Neural Information Processing Systems 34 (2021) 17098–17110.
- R. Bennett, The intrinsic dimensionality of signal collections, IEEE Transactions on Information Theory 15 (1969) 517–525.
- Wekkem: A study in fractal dimension and dimensionality reduction, in: Workshop on Fractals and Self-similarity in Data Mining: Issues and Approaches, 2002.
- Testing the manifold hypothesis, Journal of the American Mathematical Society 29 (2016) 983–1049.
- E. Levina, P. Bickel, Maximum likelihood estimation of intrinsic dimension, Advances in neural information processing systems 17 (2004).
- Estimating the intrinsic dimension of datasets by a minimal neighborhood information, Scientific reports 7 (2017) 1–8.
- The intrinsic dimension of images and its impact on learning, in: International Conference on Learning Representations, 2021.
- Towards a definition of disentangled representations, arXiv preprint arXiv:1812.02230 (2018).
- Learning diverse and discriminative representations via the principle of maximal coding rate reduction, Advances in Neural Information Processing Systems 33 (2020) 9422–9434.
- K. Do, T. Tran, Theory and evaluation metrics for learning disentangled representations, in: International Conference on Learning Representations, 2020.
- Learning disentangled representations with the wasserstein autoencoder, in: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Springer, 2021, pp. 69–84.
- Disentangled recurrent wasserstein autoencoder, in: International Conference on Learning Representations, 2021.
- Challenging common assumptions in the unsupervised learning of disentangled representations, in: international conference on machine learning, PMLR, 2019, pp. 4114–4124.
- H. Kim, A. Mnih, Disentangling by factorising, in: International Conference on Machine Learning, PMLR, 2018, pp. 2649–2658.
- Disentangling disentanglement in variational autoencoders, in: International Conference on Machine Learning, PMLR, 2019, pp. 4402–4412.
- A. Müller, Integral probability metrics and their generating classes of functions, Advances in Applied Probability 29 (1997) 429–443.
- A kernel two-sample test, The Journal of Machine Learning Research 13 (2012) 723–773.
- Kernel of cyclegan as a principal homogeneous space, in: International Conference on Learning Representations, 2020.
- Variational autoencoders pursue pca directions (by accident), in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 12406–12415.
- A convenient infinite dimensional framework for generative adversarial learning, Electronic Journal of Statistics 17 (2023) 391–428.
- D. Yarotsky, Error bounds for approximations with deep relu networks, Neural Networks 94 (2017) 103–114.
- D. Yarotsky, Optimal approximation of continuous functions by very deep relu networks, in: Conference on learning theory, PMLR, 2018, pp. 639–649.
- P. Petersen, F. Voigtlaender, Optimal approximation of piecewise smooth functions using deep relu neural networks, Neural Networks 108 (2018) 296–330.
- Sorting out lipschitz function approximation, in: International Conference on Machine Learning, PMLR, 2019, pp. 291–301.
- Limitations of the lipschitz constant as a defense against adversarial examples, in: ECML PKDD 2018 Workshops: Nemesis 2018, UrbReas 2018, SoGood 2018, IWAISe 2018, and Green Data Mining 2018, Dublin, Ireland, September 10-14, 2018, Proceedings 18, Springer, 2019, pp. 16–29.
- A. Chernodub, D. Nowicki, Norm-preserving orthogonal permutation linear unit activation functions (oplu), arXiv preprint arXiv:1604.02313 (2016).
- A. Hyvärinen, P. Pajunen, Nonlinear independent component analysis: Existence and uniqueness results, Neural networks 12 (1999) 429–439.
- Variational autoencoders and nonlinear ica: A unifying framework, in: International Conference on Artificial Intelligence and Statistics, PMLR, 2020, pp. 2207–2217.
- On integral probability metrics,\\\backslash\phi-divergences and binary classification, arXiv preprint arXiv:0901.2698 (2009).
- R. Nickl, B. M. Pötscher, Bracketing metric entropy rates and empirical central limit theorems for function classes of besov-and sobolev-type, Journal of Theoretical Probability 20 (2007) 177–199.
- Efficient regression in metric spaces via approximate lipschitz extension, IEEE Transactions on Information Theory 63 (2017) 4838–4849.
- L. Devroye, L. Gyorfi, No empirical probability measure can converge in the total variation sense for all distributions, The Annals of Statistics (1990) 1496–1499.
- Sub-weibull distributions: Generalizing sub-gaussian and sub-exponential properties to heavier tailed distributions, Stat 9 (2020) e318.
- Sample-efficient learning of mixtures, in: Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018.
- A. Jain, A. Orlitsky, A general method for robust learning from batches, Advances in Neural Information Processing Systems 33 (2020) 21775–21785.
- A. Virmaux, K. Scaman, Lipschitz regularity of deep neural networks: analysis and efficient estimation, Advances in Neural Information Processing Systems 31 (2018).
- T. Suzuki, Adaptivity of deep reLU network for learning in besov and mixed smooth besov spaces: optimal rate and curse of dimensionality, in: International Conference on Learning Representations, 2019.
- Efficient approximation of deep relu networks for functions on low dimensional manifolds, Advances in neural information processing systems 32 (2019).
- H. Montanelli, Q. Du, New error bounds for deep relu networks using sparse grids, SIAM Journal on Mathematics of Data Science 1 (2019) 78–92.
- Nonlinear approximation and (deep) relu networks, Constructive Approximation 55 (2022) 127–172.
- Approximation spaces of deep neural networks, Constructive approximation 55 (2022) 259–367.
- Hilbert space embeddings and metrics on probability measures, The Journal of Machine Learning Research 11 (2010) 1517–1561.
- T. Modeste, C. Dombry, Characterization of translation invariant mmd on r d and connections with wasserstein distances (2022).
- Deep network approximation characterized by number of neurons, arXiv preprint arXiv:1906.05497 (2019).
- U. Tanielian, G. Biau, Approximating lipschitz continuous functions with groupsort neural networks, in: International Conference on Artificial Intelligence and Statistics, PMLR, 2021, pp. 442–450.
- S. Wojtowytsch, et al., Representation formulas and pointwise properties for barron functions, Calculus of Variations and Partial Differential Equations 61 (2022) 1–37.
- Neural network approximation and estimation of classifiers with classification boundary in a barron class, arXiv preprint arXiv:2011.09363 (2020).
- On the ability of neural nets to express distributions, in: Conference on Learning Theory, PMLR, 2017, pp. 1271–1296.
- A. R. Barron, Universal approximation bounds for superpositions of a sigmoidal function, IEEE Transactions on Information theory 39 (1993) 930–945.
- J. M. Klusowski, A. R. Barron, Approximation by combinations of relu and squared relu ridge functions with l1superscript𝑙1l^{1}italic_l start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT and l0superscript𝑙0l^{0}italic_l start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT controls, IEEE Transactions on Information Theory 64 (2018) 7649–7656.
- Learning wasserstein embeddings, in: International Conference on Learning Representations, 2018.
- Variational deep embedding: An unsupervised and generative approach to clustering, in: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17, 2017, pp. 1965–1972.
- W. B. Johnson, J. Lindenstrauss, Extensions of Lipschitz mappings into a Hilbert space, Conference in modern analysis and probability 26 (1984) 189–206.
- K. G. Larsen, J. Nelson, Optimality of the johnson-lindenstrauss lemma, in: 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), IEEE, 2017, pp. 633–638.
- J. Bourgain, On lipschitz embedding of finite metric spaces in hilbert space, Israel Journal of Mathematics 52 (1985) 46–52.
- Dimensionality reduction: beyond the johnson-lindenstrauss bound, in: Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms, SIAM, 2011, pp. 868–887.
- Multi-to one-dimensional optimal transport, Communications on Pure and Applied Mathematics 70 (2017) 2405–2444.
- R. J. McCann, B. Pass, Optimal transportation between unequal dimensions, Archive for Rational Mechanics and Analysis 238 (2020) 1475–1520.
- A. Ben-Israel, The change-of-variables formula using matrix volume, SIAM Journal on Matrix Analysis and Applications 21 (1999) 300–312.
- R. J. McCann, Existence and uniqueness of monotone measure-preserving maps, Duke Mathematical Journal 80 (1995) 309 – 323.
- Y. Brenier, Polar factorization and monotone rearrangement of vector-valued functions, Communications on pure and applied mathematics 44 (1991) 375–417.
- L. A. Caffarelli, Monotonicity properties of optimal transportation and the fkg and related inequalities, Communications in Mathematical Physics 214 (2000) 547–563.
- M. Colombo, M. Fathi, Bounds on optimal transport maps onto log-concave measures, Journal of Differential Equations 271 (2021) 1007–1022.
- F. Topsoe, Some inequalities for information divergence and related measures of discrimination, IEEE Transactions on information theory 46 (2000) 1602–1609.
- D. M. Endres, J. E. Schindelin, A new metric for probability distributions, IEEE Transactions on Information theory 49 (2003) 1858–1860.
- Sorting with adversarial comparators and application to density estimation, in: 2014 IEEE International Symposium on Information Theory, IEEE, 2014, pp. 1682–1686.
- Computational optimal transport: Complexity by accelerated gradient descent is better than by sinkhorn’s algorithm, in: Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, PMLR, 2018, pp. 1367–1376.
- E. Giné, A. Guillou, Rates of strong uniform consistency for multivariate kernel density estimators, in: Annales de l’Institut Henri Poincare (B) Probability and Statistics, volume 38, Elsevier, 2002, pp. 907–921.
- Structure-preserving gans, in: International Conference on Machine Learning, 2022.
- Sample complexity of probability divergences under group symmetry, in: International Conference on Machine Learning, PMLR, 2023, pp. 4713–4734.
- J. Weed, F. R. Bach, Sharp asymptotic and finite-sample rates of convergence of empirical measures in wasserstein distance, Bernoulli (2017).
- Nonlinear dimension reduction via outer bi-lipschitz extensions, in: Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, 2018, pp. 1088–1101.
- G. David, S. Semmes, Regular mappings between dimensions, Publicacions Matematiques (2000) 369–417.
- On the capacity of deep generative networks for approximating distributions, Neural networks 145 (2022) 144–154.
- A general decision theory for Huber’s ϵitalic-ϵ\epsilonitalic_ϵ-contamination model, Electronic Journal of Statistics 10 (2016) 3752 – 3774.
- Generalized resilience and robust statistics, The Annals of Statistics 50 (2022) 2256–2283.
- H. Liu, C. Gao, Density estimation with contamination: minimax rates and theory of adaptation, Electronic Journal of Statistics 13 (2019) 3613 – 3653.
- Z. Liu, P.-L. Loh, Robust W-GAN-based estimation under Wasserstein contamination, Information and Inference: A Journal of the IMA 12 (2022) 312–362.
- Learning from few samples: Transformation-invariant svms with composition and locality at multiple scales, Advances in Neural Information Processing Systems 35 (2022) 9151–9163.
- C. T. Li, F. Farnia, Mode-seeking divergences: Theory and applications to gans, in: Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, volume 206 of Proceedings of Machine Learning Research, PMLR, 2023, pp. 8321–8350.
- J. Yukich, Laws of large numbers for classes of functions, Journal of Multivariate Analysis 17 (1985) 245–260.
- Concentration of measure, On-line]. Available: http://www. stat. cmu. edu/ larry/= sml/Concentration. pdf (2008).
- Statistical inference for generative models with maximum mean discrepancy, arXiv preprint arXiv:1906.05944 (2019).
- J. Weed, Q. Berthet, Estimation of smooth densities in wasserstein distance, in: conference on Learning Theory, PMLR, 2019, pp. 3118–3119.
- J. A. Peacock, Two-dimensional goodness-of-fit testing in astronomy, Monthly Notices of the Royal Astronomical Society 202 (1983) 615–627.
- G. Fasano, A. Franceschini, A multidimensional version of the kolmogorov–smirnov test, Monthly Notices of the Royal Astronomical Society 225 (1987) 155–170.
- fasano. franceschini. test: An implementation of a multidimensional ks test in r, arXiv preprint arXiv:2106.10539 (2021).
- L. Baringhaus, C. Franz, On a new multivariate two-sample test, Journal of multivariate analysis 88 (2004) 190–206.
- C. Franz, cramer: multivariate nonparametric cramer-test for the two-sample-problem, R package version 0.8-1 (2006).