Probabilistic partition of unity networks for high-dimensional regression problems (2210.02694v2)
Abstract: We explore the probabilistic partition of unity network (PPOU-Net) model in the context of high-dimensional regression problems and propose a general framework focusing on adaptive dimensionality reduction. With the proposed framework, the target function is approximated by a mixture of experts model on a low-dimensional manifold, where each cluster is associated with a local fixed-degree polynomial. We present a training strategy that leverages the expectation maximization (EM) algorithm. During the training, we alternate between (i) applying gradient descent to update the DNN coefficients; and (ii) using closed-form formulae derived from the EM algorithm to update the mixture of experts model parameters. Under the probabilistic formulation, step (ii) admits the form of embarrassingly parallelizable weighted least-squares solves. The PPOU-Nets consistently outperform the baseline fully-connected neural networks of comparable sizes in numerical experiments of various data dimensions. We also explore the proposed model in applications of quantum computing, where the PPOU-Nets act as surrogate models for cost landscapes associated with variational quantum circuits.
- Yarotsky D. Error bounds for approximations with deep ReLU networks. Neural Networks 2017; 94: 103–114.
- Yarotsky D. Optimal approximation of continuous functions by very deep ReLU networks. In: PMLR. ; 2018: 639–649.
- Kirchdoerfer T, Ortiz M. Data-driven computational mechanics. Computer Methods in Applied Mechanics and Engineering 2016; 304: 81–101.
- Sirignano J, Spiliopoulos K. DGM: A deep learning algorithm for solving partial differential equations. Journal of computational physics 2018; 375: 1339–1364.
- Xu K, Darve E. Physics constrained learning for data-driven inverse modeling from sparse observations. arXiv preprint arXiv:2002.10521 2020.
- Xu K, Darve E. Physics constrained learning for data-driven inverse modeling from sparse observations. Journal of Computational Physics 2022: 110938.
- Citeseer . 1994.
- Ling J, Templeton J. Evaluation of machine learning algorithms for prediction of regions of high Reynolds averaged Navier Stokes uncertainty. Physics of Fluids 2015; 27(8): 085103.
- Verleysen M, François D. The curse of dimensionality in data mining and time series prediction. In: Springer. ; 2005: 758–770.
- Bach F. Breaking the curse of dimensionality with convex neural networks. The Journal of Machine Learning Research 2017; 18(1): 629–681.
- Elsevier . 2000.
- Boyd JP. Chebyshev and Fourier spectral methods. Courier Corporation . 2001.
- Trefethen LN. Approximation Theory and Approximation Practice, Extended Edition. SIAM . 2019.
- Grafakos L. Classical fourier analysis. 2. Springer . 2008.
- Newman DJ. Approximation with rational functions. American Mathematical Soc. . 1979.
- Buhmann MD. Radial basis functions: theory and implementations. 12. Cambridge university press . 2003.
- Wegman EJ, Wright IW. Splines in statistics. Journal of the American Statistical Association 1983; 78(382): 351–365.
- Mallat S. A wavelet tour of signal processing. Elsevier . 1999.
- Donoho DL, Johnstone IM. Projection-based approximation and a duality with kernel methods. The Annals of Statistics 1989: 58–106.
- Ibrahimoglu BA. Lebesgue functions and Lebesgue constants in polynomial interpolation. Journal of Inequalities and Applications 2016; 2016(1): 1–15.
- Berg J, Nyström K. A unified deep artificial neural network approach to partial differential equations in complex geometries. Neurocomputing 2018; 317: 28–41.
- Tripathy RK, Bilionis I. Deep UQ: Learning deep neural network surrogate models for high dimensional uncertainty quantification. Journal of computational physics 2018; 375: 565–588.
- Schwab C, Zech J. Deep learning in high dimension: Neural network expression rates for generalized polynomial chaos expansions in UQ. Analysis and Applications 2019; 17(01): 19–55.
- Damianou A, Lawrence ND. Deep gaussian processes. In: PMLR. ; 2013: 207–215.
- Constantine PG. Active subspaces: Emerging ideas for dimension reduction in parameter studies. SIAM . 2015.
- Constantine P, Gleich D. Computing active subspaces with Monte Carlo. arXiv preprint arXiv:1408.0545 2014.
- Hoffmann H. Kernel PCA for novelty detection. Pattern recognition 2007; 40(3): 863–874.
- Konakli K, Sudret B. Reliability analysis of high-dimensional models using low-rank tensor approximations. Probabilistic Engineering Mechanics 2016; 46: 18–36.
- Gorodetsky AA, Jakeman JD. Gradient-based optimization for regression in the functional tensor-train format. Journal of Computational Physics 2018; 374: 1219–1238.
- He Z, Zhang Z. High-dimensional uncertainty quantification via active and rank-adaptive tensor regression. In: IEEE. ; 2020: 1–3.
- Hinton GE, Salakhutdinov RR. Reducing the dimensionality of data with neural networks. science 2006; 313(5786): 504–507.
- Li M, Wang Z. Deep learning for high-dimensional reliability analysis. Mechanical Systems and Signal Processing 2020; 139: 106399.
- Boncoraglio G, Farhat C. Active manifold and model-order reduction to accelerate multidisciplinary analysis and optimization. AIAA Journal 2021; 59(11): 4739–4753.
- Kingma DP, Welling M. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 2013.
- Zhu Y, Zabaras N. Bayesian deep convolutional encoder–decoder networks for surrogate modeling and uncertainty quantification. Journal of Computational Physics 2018; 366: 415–447.
- Masoudnia S, Ebrahimpour R. Mixture of experts: a literature survey. Artificial Intelligence Review 2014; 42(2): 275–293.
- Jordan MI, Jacobs RA. Hierarchical mixtures of experts and the EM algorithm. Neural computation 1994; 6(2): 181–214.
- Jordan MI, Xu L. Convergence results for the EM approach to mixtures of experts architectures. Neural networks 1995; 8(9): 1409–1431.
- Surendran D. A Swiss Roll Dataset.; 2004.
- Martins A, Astudillo R. From softmax to sparsemax: A sparse model of attention and multi-label classification. In: PMLR. ; 2016: 1614–1623.
- Kingma DP, Ba J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 2014.
- Sivaloganathan S. A finite-element, parameter stepping solution procedure for the steady-state Boussinesq equations. Applied mathematics and computation 1992; 51(1): 43–83.
- Anderson JD, Wendt J. Computational fluid dynamics. 206. Springer . 1995.
- MIT press Cambridge . 2016.
- Beam RM, Bailey HE. Newton’s Method for the Navier-Stokes Equations. In: Springer. 1988 (pp. 1457–1460).
- Rannacher R. Finite element methods for the incompressible Navier-Stokes equations. In: Springer. 2000 (pp. 191–293).
- Margossian CC. A review of automatic differentiation and its efficient implementation. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 2019; 9(4): e1305.
- Liu DC, Nocedal J. On the limited memory BFGS method for large scale optimization. Mathematical programming 1989; 45(1-3): 503–528.
- Fedorov AG, Viskanta R. Three-dimensional conjugate heat transfer in the microchannel heat sink for electronic packaging. International Journal of Heat and Mass Transfer 2000; 43(3): 399–415.