Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Refined generalization analysis of the Deep Ritz Method and Physics-Informed Neural Networks (2401.12526v3)

Published 23 Jan 2024 in math.NA and cs.NA

Abstract: In this paper, we present refined generalization bounds for the Deep Ritz Method (DRM) and Physics-Informed Neural Networks (PINNs). For the DRM, we focus on two prototype elliptic PDEs: Poisson equation and static Schr\"odinger equation on the $d$-dimensional unit hypercube with the Neumann boundary condition. And sharper generalization bounds are derived based on the localization techniques under the assumptions that the exact solutions of the PDEs lie in the Barron spaces or the general Sobolev spaces. For the PINNs, we investigate the general linear second elliptic PDEs with Dirichlet boundary condition via the local Rademacher complexity in the multi-task learning.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (58)
  1. I. E. Lagaris, A. Likas, and D. I. Fotiadis, “Artificial neural networks for solving ordinary and partial differential equations,” IEEE transactions on neural networks, vol. 9, no. 5, pp. 987–1000, 1998.
  2. M. Raissi, P. Perdikaris, and G. E. Karniadakis, “Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations,” Journal of Computational physics, vol. 378, pp. 686–707, 2019.
  3. B. Yu et al., “The deep ritz method: a deep learning-based numerical algorithm for solving variational problems,” Communications in Mathematics and Statistics, vol. 6, no. 1, pp. 1–12, 2018.
  4. J. Lu, Z. Shen, H. Yang, and S. Zhang, “Deep network approximation for smooth functions,” SIAM Journal on Mathematical Analysis, vol. 53, no. 5, pp. 5465–5506, 2021.
  5. Z. Shen, H. Yang, and S. Zhang, “Optimal approximation rate of relu networks in terms of width and depth,” Journal de Mathématiques Pures et Appliquées, vol. 157, pp. 101–135, 2022.
  6. D. Belomestny, A. Naumov, N. Puchkin, and S. Samsonov, “Simultaneous approximation of a smooth function and its derivatives by deep neural networks with piecewise-polynomial activations,” Neural Networks, vol. 161, pp. 242–253, 2023.
  7. Y. Yang, H. Yang, and Y. Xiang, “Nearly optimal VC-dimension and pseudo-dimension bounds for deep neural network derivatives,” in Thirty-seventh Conference on Neural Information Processing Systems, 2023. [Online]. Available: https://openreview.net/forum?id=SE73LzWNjr
  8. Y. Yang, Y. Wu, H. Yang, and Y. Xiang, “Nearly optimal approximation rates for deep super relu networks on sobolev spaces,” arXiv preprint arXiv:2310.10766, 2023.
  9. D. Yarotsky, “Error bounds for approximations with deep relu networks,” Neural Networks, vol. 94, pp. 103–114, 2017.
  10. A. R. Barron, “Universal approximation bounds for superpositions of a sigmoidal function,” IEEE Transactions on Information theory, vol. 39, no. 3, pp. 930–945, 1993.
  11. K. Hornik, “Approximation capabilities of multilayer feedforward networks,” Neural networks, vol. 4, no. 2, pp. 251–257, 1991.
  12. Y. Jiao, Y. Lai, D. Li, X. Lu, F. Wang, Y. Wang, and J. Z. Yang, “A rate of convergence of physics informed neural networks for the linear second order elliptic pdes,” arXiv preprint arXiv:2109.01780, 2021.
  13. C. Duan, Y. Jiao, Y. Lai, X. Lu, and Z. Yang, “Convergence rate analysis for deep ritz method,” arXiv preprint arXiv:2103.13330, 2021.
  14. P.-H. Chiu, J. C. Wong, C. Ooi, M. H. Dao, and Y.-S. Ong, “Can-pinn: A fast physics-informed neural network based on coupled-automatic–numerical differentiation method,” Computer Methods in Applied Mechanics and Engineering, vol. 395, p. 114909, 2022.
  15. P. Ren, C. Rao, Y. Liu, J.-X. Wang, and H. Sun, “Phycrnet: Physics-informed convolutional-recurrent network for solving spatiotemporal pdes,” Computer Methods in Applied Mechanics and Engineering, vol. 389, p. 114399, 2022.
  16. R. Zhang, Y. Liu, and H. Sun, “Physics-informed multi-lstm networks for metamodeling of nonlinear structures,” Computer Methods in Applied Mechanics and Engineering, vol. 369, p. 113226, 2020.
  17. Y. Zang, G. Bao, X. Ye, and H. Zhou, “Weak adversarial networks for high-dimensional partial differential equations,” Journal of Computational Physics, vol. 411, p. 109409, 2020.
  18. P. L. Bartlett and S. Mendelson, “Rademacher and gaussian complexities: Risk bounds and structural results,” Journal of Machine Learning Research, vol. 3, no. Nov, pp. 463–482, 2002.
  19. P. L. Bartlett, O. Bousquet, and S. Mendelson, “Local rademacher complexities,” 2005.
  20. V. Koltchinskii, “Local rademacher complexities and oracle inequalities in risk minimization,” 2006.
  21. S. Li and Y. Liu, “Sharper generalization bounds for clustering,” in International Conference on Machine Learning.   PMLR, 2021, pp. 6392–6402.
  22. C. Cortes, M. Kloft, and M. Mohri, “Learning kernels using local rademacher complexity,” Advances in neural information processing systems, vol. 26, 2013.
  23. N. Yousefi, Y. Lei, M. Kloft, M. Mollaghasemi, and G. C. Anagnostopoulos, “Local rademacher complexity-based learning guarantees for multi-task learning,” The Journal of Machine Learning Research, vol. 19, no. 1, pp. 1385–1431, 2018.
  24. D. Belomestny, L. Iosipoi, Q. Paris, and N. Zhivotovskiy, “Empirical variance minimization with applications in variance reduction and optimal control,” arXiv preprint arXiv:1712.04667, 2017.
  25. N. Zhivotovskiy and S. Hanneke, “Localization of vc classes: Beyond local rademacher complexities,” Theoretical Computer Science, vol. 742, pp. 27–49, 2018.
  26. T. Liang, A. Rakhlin, and K. Sridharan, “Learning with square loss: Localization through offset rademacher complexity,” in Conference on Learning Theory.   PMLR, 2015, pp. 1260–1285.
  27. C. Duan, Y. Jiao, L. Kang, X. Lu, and J. Z. Yang, “Fast excess risk rates via offset rademacher complexity,” in International Conference on Machine Learning.   PMLR, 2023, pp. 8697–8716.
  28. V. Kanade, P. Rebeschini, and T. Vaskevicius, “Exponential tail local rademacher complexity risk bounds without the bernstein condition,” arXiv preprint arXiv:2202.11461, 2022.
  29. J. Yang, S. Sun, and D. M. Roy, “Fast-rate pac-bayes generalization bounds via shifted rademacher processes,” Advances in Neural Information Processing Systems, vol. 32, 2019.
  30. V. Koltchinskii, “Oracle inequalities in empirical risk minimization and sparse recovery problems,” 2011.
  31. J. Calder, “Consistency of lipschitz learning with infinite unlabeled data and finite labeled data,” SIAM Journal on Mathematics of Data Science, vol. 1, no. 4, pp. 780–812, 2019.
  32. Y. Shin, J. Darbon, and G. E. Karniadakis, “On the convergence of physics informed neural networks for linear second-order elliptic and parabolic type pdes,” arXiv preprint arXiv:2004.01806, 2020.
  33. S. Mishra and R. Molinaro, “Estimates on the generalization error of physics-informed neural networks for approximating a class of inverse problems for pdes,” IMA Journal of Numerical Analysis, vol. 42, no. 2, pp. 981–1022, 2022.
  34. Y. Lu, J. Lu, and M. Wang, “A priori generalization analysis of the deep ritz method for solving high dimensional elliptic partial differential equations,” in Conference on learning theory.   PMLR, 2021, pp. 3196–3241.
  35. Y. Lu, H. Chen, J. Lu, L. Ying, and J. Blanchet, “Machine learning for elliptic pdes: Fast rate generalization bound, neural scaling law and minimax optimality,” arXiv preprint arXiv:2110.06897, 2021.
  36. J. W. Siegel and J. Xu, “High-order approximation rates for shallow neural networks with cosine and reluk activation functions,” Applied and Computational Harmonic Analysis, vol. 58, pp. 1–26, 2022.
  37. J. W. Siegel, “Optimal approximation of zonoids and uniform approximation by shallow neural networks,” arXiv preprint arXiv:2307.15285, 2023.
  38. S. Mendelson, “Learning without concentration,” Journal of the ACM (JACM), vol. 62, no. 3, pp. 1–25, 2015.
  39. ——, “Learning without concentration for general loss functions,” Probability Theory and Related Fields, vol. 171, no. 1-2, pp. 459–502, 2018.
  40. G. Lecué and S. Mendelson, “Learning subgaussian classes: Upper and minimax bounds,” arXiv preprint arXiv:1305.4825, 2013.
  41. C. Ma, L. Wu et al., “The barron space and the flow-induced function spaces for neural network models,” Constructive Approximation, vol. 55, no. 1, pp. 369–406, 2022.
  42. J. W. Siegel and J. Xu, “Characterization of the variation spaces corresponding to shallow neural networks,” Constructive Approximation, pp. 1–24, 2023.
  43. ——, “Sharp bounds on the approximation rates, metric entropy, and n-widths of shallow neural networks,” Foundations of Computational Mathematics, pp. 1–57, 2022.
  44. C. Wang, S. Li, D. He, and L. Wang, “Is l2 physics informed loss always suitable for training physics informed neural network?” Advances in Neural Information Processing Systems, vol. 35, pp. 8278–8290, 2022.
  45. Y. Gao, Y. Gu, and M. Ng, “Gradient descent finds the global optima of two-layer physics-informed neural networks,” in International Conference on Machine Learning.   PMLR, 2023, pp. 10 676–10 707.
  46. T. Luo and H. Yang, “Two-layer neural networks for partial differential equations: Optimization and generalization theory,” arXiv preprint arXiv:2006.15733, 2020.
  47. S. S. Du, X. Zhai, B. Poczos, and A. Singh, “Gradient descent provably optimizes over-parameterized neural networks,” arXiv preprint arXiv:1810.02054, 2018.
  48. Y. Makovoz, “Random approximants and neural networks,” Journal of Approximation Theory, vol. 85, no. 1, pp. 98–109, 1996.
  49. J. He, L. Li, J. Xu, and C. Zheng, “Relu deep neural networks and linear finite elements,” arXiv preprint arXiv:1807.03973, 2018.
  50. A. v. d. Vaart and J. A. Wellner, “Empirical processes,” in Weak Convergence and Empirical Processes: With Applications to Statistics.   Springer, 2023, pp. 127–384.
  51. M. H. Farrell, T. Liang, and S. Misra, “Deep neural networks for estimation and inference,” Econometrica, vol. 89, no. 1, pp. 181–213, 2021.
  52. J. M. Klusowski and A. R. Barron, “Approximation by combinations of relu and squared relu ridge functions with l1 and l0 controls,” IEEE Transactions on Information Theory, vol. 64, no. 12, pp. 7649–7656, 2018.
  53. J. Xu, “The finite neuron method and convergence analysis,” arXiv preprint arXiv:2010.01458, 2020.
  54. D. Belomestny, A. Goldman, A. Naumov, and S. Samsonov, “Theoretical guarantees for neural control variates in mcmc,” Mathematics and Computers in Simulation, 2024.
  55. A. Maurer, “Entropy and concentration,” Harmonic and Applied Analysis: From Radon Transforms to Machine Learning, pp. 55–100, 2021.
  56. Y. Lei, L. Ding, and Y. Bi, “Local rademacher complexity bounds based on covering numbers,” Neurocomputing, vol. 218, pp. 320–330, 2016.
  57. P. L. Bartlett, N. Harvey, C. Liaw, and A. Mehrabian, “Nearly-tight vc-dimension and pseudodimension bounds for piecewise linear neural networks,” The Journal of Machine Learning Research, vol. 20, no. 1, pp. 2285–2301, 2019.
  58. S. Agmon, A. Douglis, and L. Nirenberg, “Estimates near the boundary for solutions of elliptic partial differential equations satisfying general boundary conditions. i,” Communications on pure and applied mathematics, vol. 12, no. 4, pp. 623–727, 1959.

Summary

We haven't generated a summary for this paper yet.