Sparse Cholesky Factorization for Solving Nonlinear PDEs via Gaussian Processes (2304.01294v3)
Abstract: In recent years, there has been widespread adoption of machine learning-based approaches to automate the solving of partial differential equations (PDEs). Among these approaches, Gaussian processes (GPs) and kernel methods have garnered considerable interest due to their flexibility, robust theoretical guarantees, and close ties to traditional methods. They can transform the solving of general nonlinear PDEs into solving quadratic optimization problems with nonlinear, PDE-induced constraints. However, the complexity bottleneck lies in computing with dense kernel matrices obtained from pointwise evaluations of the covariance kernel, and its \textit{partial derivatives}, a result of the PDE constraint and for which fast algorithms are scarce. The primary goal of this paper is to provide a near-linear complexity algorithm for working with such kernel matrices. We present a sparse Cholesky factorization algorithm for these matrices based on the near-sparsity of the Cholesky factor under a novel ordering of pointwise and derivative measurements. The near-sparsity is rigorously justified by directly connecting the factor to GP regression and exponential decay of basis functions in numerical homogenization. We then employ the Vecchia approximation of GPs, which is optimal in the Kullback-Leibler divergence, to compute the approximate factor. This enables us to compute $\epsilon$-approximate inverse Cholesky factors of the kernel matrices with complexity $O(N\logd(N/\epsilon))$ in space and $O(N\log{2d}(N/\epsilon))$ in time. We integrate sparse Cholesky factorizations into optimization algorithms to obtain fast solvers of the nonlinear PDE. We numerically illustrate our algorithm's near-linear space/time complexity for a broad class of nonlinear PDEs such as the nonlinear elliptic, Burgers, and Monge-Amp`ere equations.
- An 𝒪(nlogn)𝒪𝑛𝑛\mathcal{O}(n\log n)caligraphic_O ( italic_n roman_log italic_n ) fast direct solver for partial hierarchically semi-separable matrices: With application to radial basis function interpolation. Journal of Scientific Computing, 57:477–501, 2013.
- Fast direct methods for Gaussian processes. IEEE transactions on pattern analysis and machine intelligence, 38(2):252–265, 2015.
- Error analysis of kernel/gp methods for nonlinear and parametric pdes. arXiv preprint arXiv:2305.04962, 2023.
- Reproducing kernel Hilbert spaces in probability and statistics. Springer Science & Business Media, 2011.
- Fast wavelet transforms and numerical algorithms I. Communications on pure and applied mathematics, 44(2):141–183, 1991.
- Model reduction and neural networks for parametric PDEs. The SMAI journal of computational mathematics, 7:121–157, 2021.
- Randomly pivoted Cholesky: Practical approximation of a kernel matrix with few entry evaluations. arXiv preprint arXiv:2207.06503, 2022.
- Solving and learning nonlinear PDEs with Gaussian processes. Journal of Computational Physics, 447:110668, 2021.
- Function approximation via the subsampled Poincaré inequality. Discrete and Continuous Dynamical Systems, 41(1):169–199, 2020.
- Multiscale elliptic PDE upscaling and function approximation via subsampled data. Multiscale Modeling & Simulation, 20(1):188–219, 2022.
- Consistency of empirical Bayes and kernel flow for hierarchical parameter estimation. Mathematics of Computation, 90(332):2527–2578, 2021.
- Bayesian probabilistic numerical methods. SIAM review, 61(4):756–789, 2019.
- One-shot learning of stochastic differential equations with data adapted kernels. Physica D: Nonlinear Phenomena, 444:133583, 2023.
- Rethinking the importance of sampling in physics-informed neural networks. arXiv preprint arXiv:2207.02338, 2022.
- High-dimensional Gaussian process inference with derivatives. In International Conference on Machine Learning, pages 2535–2545. PMLR, 2021.
- Scaling Gaussian process regression with derivatives. Advances in neural information processing systems, 31, 2018.
- Covariance tapering for interpolation of large spatial datasets. Journal of Computational and Graphical Statistics, 15(3):502–523, 2006.
- Scalable Gaussian process computations using hierarchical matrices. Journal of Computational and Graphical Statistics, 29(2):227–237, 2020.
- LU factorization of non-standard forms and direct multiresolution solvers. Applied and Computational Harmonic Analysis, 5(2):156–201, 1998.
- Can physics-informed neural networks beat the finite element method? arXiv preprint arXiv:2302.04107, 2023.
- Strong rank revealing Cholesky factorization. Electronic Transactions on Numerical Analysis, 17:76–92, 2004.
- Joseph Guinness. Permutation and grouping methods for sharpening Gaussian process approximations. Technometrics, 60(4):415–429, 2018.
- Wolfgang Hackbusch. A sparse matrix arithmetic based on H-matrices. Part I: Introduction to H-matrices. Computing, 62(2):89–108, 1999.
- Data-sparse approximation by adaptive H 2-matrices. Computing, 69:1–35, 2002.
- A sparse H-matrix arithmetic, part II: Application to multi-dimensional problems. Computing, 64(1):21–47, 2000.
- Solving high-dimensional partial differential equations using deep learning. Proceedings of the National Academy of Sciences, 115(34):8505–8510, 2018.
- Super-localization of elliptic multiscale problems. Mathematics of Computation, 92(341):981–1003, 2023.
- Oversampling for the multiscale finite element method. Multiscale Modeling & Simulation, 11(4):1149–1175, 2013.
- Sparse operator compression of higher-order elliptic operators with rough coefficients. Research in the Mathematical Sciences, 4:1–49, 2017.
- Neural tangent kernel: Convergence and generalization in neural networks. Advances in neural information processing systems, 31, 2018.
- Physics-informed machine learning. Nature Reviews Physics, 3(6):422–440, 2021.
- Matthias Katzfuss. A multi-resolution approximation for massive spatial datasets. Journal of the American Statistical Association, 112(517):201–214, 2017.
- Vecchia approximations of Gaussian-process predictions. Journal of Agricultural, Biological and Environmental Statistics, 25:383–414, 2020.
- An analysis of a class of variational multiscale methods based on subspace decomposition. Mathematics of Computation, 87(314):2765–2774, 2018.
- Characterizing possible failure modes in physics-informed neural networks. Advances in Neural Information Processing Systems, 34:26548–26560, 2021.
- Hierarchical interpolative factorization for elliptic operators: Integral equations. Communications on Pure and Applied Mathematics, 69(7):1314–1353, 2016.
- Deep neural networks as Gaussian processes. arXiv preprint arXiv:1711.00165, 2017.
- New efficient and robust HSS Cholesky factorization of SPD matrices. SIAM Journal on Matrix Analysis and Applications, 33(3):886–904, 2012.
- Fourier neural operator for parametric partial differential equations. arXiv preprint arXiv:2010.08895, 2020.
- An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73(4):423–498, 2011.
- Likelihood approximation with hierarchical matrices for large spatial datasets. Computational Statistics & Data Analysis, 137:115–132, 2019.
- When gaussian process meets big data: A review of scalable GPs. IEEE transactions on neural networks and learning systems, 31(11):4405–4423, 2020.
- A kernel approach for pde discovery and operator learning. arXiv preprint arXiv:2210.08140, 2022.
- Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators. Nature machine intelligence, 3(3):218–229, 2021.
- Inverses of 2×\times× 2 block matrices. Computers & Mathematics with Applications, 43(1-2):119–129, 2002.
- Localization of elliptic multiscale problems. Mathematics of Computation, 83(290):2583–2603, 2014.
- Sparse Gaussian processes for solving nonlinear PDEs. arXiv preprint arXiv:2205.03760, 2022.
- Fast spatial Gaussian process maximum likelihood estimation via skeletonization factorizations. Multiscale Modeling & Simulation, 15(4):1584–1611, 2017.
- A recursive skeletonization factorization based on strong admissibility. Multiscale Modeling & Simulation, 15(2):768–796, 2017.
- Kevin P Murphy. Machine learning: a probabilistic perspective. MIT press, 2012.
- Recursive sampling for the Nyström method. Advances in neural information processing systems, 30, 2017.
- Radford M Neal. Priors for infinite networks. Bayesian learning for neural networks, pages 29–53, 1996.
- The random feature model for input-output maps between Banach spaces. SIAM Journal on Scientific Computing, 43(5):A3212–A3243, 2021.
- Houman Owhadi. Bayesian numerical homogenization. Multiscale Modeling & Simulation, 13(3):812–828, 2015.
- Houman Owhadi. Multigrid with rough coefficients and multiresolution operator decomposition from hierarchical information games. Siam Review, 59(1):99–149, 2017.
- Operator-Adapted Wavelets, Fast Solvers, and Numerical Homogenization: From a Game Theoretic Approach to Numerical Approximation and Algorithm Design, volume 35. Cambridge University Press, 2019.
- Kernel flows: from learning kernels from data into the abyss. Journal of Computational Physics, 389:22–47, 2019.
- Scaling Gaussian processes with derivative information using variational inference. Advances in Neural Information Processing Systems, 34:6442–6453, 2021.
- A unifying view of sparse approximate Gaussian process regression. The Journal of Machine Learning Research, 6:1939–1959, 2005.
- Random features for large-scale kernel machines. Advances in neural information processing systems, 20, 2007.
- Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational physics, 378:686–707, 2019.
- Numerical Gaussian processes for time-dependent and nonlinear partial differential equations. SIAM Journal on Scientific Computing, 40(1):A172–A198, 2018.
- Correlation priors. Inverse problems and imaging, 5(1):167–184, 2011.
- A full scale approximation of covariance functions for large spatial data sets. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 74(1):111–132, 2012.
- Finite element representations of gaussian processes: Balancing numerical and statistical accuracy. SIAM/ASA Journal on Uncertainty Quantification, 10(4):1323–1349, 2022.
- The SPDE approach to Matérn fields: Graph representations. Statistical Science, 37(4):519–540, 2022.
- Kernel techniques: from machine learning to meshless methods. Acta numerica, 15:543–639, 2006.
- Sparse Cholesky factorization by Kullback–Leibler minimization. SIAM Journal on Scientific Computing, 43(3):A2019–A2046, 2021.
- Compression, inversion, and approximate PCA of dense kernel matrices at near-linear computational complexity. Multiscale Modeling & Simulation, 19(2):688–730, 2021.
- Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT press, 2002.
- Michael L Stein. The screening effect in kriging. The Annals of Statistics, 30(1):298–323, 2002.
- Michael L Stein. 2010 Rietz lecture: When does the screening effect hold? The Annals of Statistics, 39(6):2795–2819, 2011.
- Aldo V Vecchia. Estimation and model identification for continuous spatial processes. Journal of the Royal Statistical Society: Series B (Methodological), 50(2):297–312, 1988.
- Understanding and mitigating gradient flow pathologies in physics-informed neural networks. SIAM Journal on Scientific Computing, 43(5):A3055–A3081, 2021.
- When and why pinns fail to train: A neural tangent kernel perspective. Journal of Computational Physics, 449:110768, 2022.
- Holger Wendland. Scattered data approximation, volume 17. Cambridge university press, 2004.
- Using the Nyström method to speed up kernel machines. Advances in neural information processing systems, 13, 2000.
- Gaussian processes for machine learning, volume 2. MIT press Cambridge, MA, 2006.
- Kernel interpolation for scalable structured Gaussian processes (KISS-GP). In International conference on machine learning, pages 1775–1784. PMLR, 2015.
- Deep kernel learning. In Artificial intelligence and statistics, pages 370–378. PMLR, 2016.
- Bayesian optimization with gradients. Advances in neural information processing systems, 30, 2017.
- Sparse approximation for Gaussian process with derivative observations. In AI 2018: Advances in Artificial Intelligence: 31st Australasian Joint Conference, Wellington, New Zealand, December 11-14, 2018, Proceedings, pages 507–518. Springer, 2018.
- Competitive physics informed networks. In The Eleventh International Conference on Learning Representations, 2023.
- Meshless methods based on collocation with radial basis functions. Computational mechanics, 26:333–343, 2000.