Sup-Norm Convergence of Deep Neural Network Estimator for Nonparametric Regression by Adversarial Training (2307.04042v1)
Abstract: We show the sup-norm convergence of deep neural network estimators with a novel adversarial training scheme. For the nonparametric regression problem, it has been shown that an estimator using deep neural networks can achieve better performances in the sense of the $L2$-norm. In contrast, it is difficult for the neural estimator with least-squares to achieve the sup-norm convergence, due to the deep structure of neural network models. In this study, we develop an adversarial training scheme and investigate the sup-norm convergence of deep neural network estimators. First, we find that ordinary adversarial training makes neural estimators inconsistent. Second, we show that a deep neural network estimator achieves the optimal rate in the sup-norm sense by the proposed adversarial training with correction. We extend our adversarial training to general setups of a loss function and a data-generating function. Our experiments support the theoretical findings.
- Neural network learning: Theoretical foundations, volume 9. cambridge university press Cambridge, 1999.
- Some new asymptotic theory for least squares series: Pointwise and uniform results. Journal of Econometrics, 186(2):345–366, 2015.
- On deep learning as a remedy for the curse of dimensionality in nonparametric regression. The Annals of Statistics, 47(4):2261–2285, 2019.
- Ismaël Castillo. On bayesian supremum norm contraction rates. The Annals of Statistics, 42(5):2058–2091, 2014.
- Optimal uniform convergence rates and asymptotic normality for series estimators under weak dependence and weak conditions. Journal of Econometrics, 188(2):447–465, 2015.
- Optimal sup-norm rates and uniform inference on nonlinear functionals of nonparametric iv regression. Quantitative Economics, 9(1):39–84, 2018.
- Efficient approximation of deep relu networks for functions on low dimensional manifolds. Advances in neural information processing systems, 32, 2019.
- Dennis D Cox. Approximation of least squares regression on nested subspaces. The Annals of Statistics, 16(2):713–732, 1988.
- George Cybenko. Approximation by superpositions of a sigmoidal function. Mathematics of control, signals and systems, 2(4):303–314, 1989.
- Minimax estimation via wavelet shrinkage. The annals of Statistics, 26(3):879–921, 1998.
- Robert M De Jong. A note on “convergence rates and asymptotic normality for series estimators”: uniform convergence rates. Journal of Econometrics, 111(1):1–9, 2002.
- Deep neural networks for estimation and inference. Econometrica, 89(1):181–213, 2021.
- Generalizable adversarial training via spectral normalization. In International Conference on Learning Representations, 2018.
- Kernel density estimators: convergence in distribution for weighted sup-norms. Probability Theory and Related Fields, 130(2):167–198, 2004.
- Uniform limit theorems for wavelet density estimators. The Annals of Probability, 37(4):1605–1646, 2009.
- Rates of contraction for posterior distributions in lr-metrics, 1≤\leq≤ r≤∞absent\leq\infty≤ ∞. The Annals of Statistics, 39(6):2883–2911, 2011.
- Mathematical foundations of infinite-dimensional statistical models. Cambridge university press, 2021.
- On the estimation of smooth densities by strict probability densities at optimal rates in sup-norm. IMS Collections, From Probability to Statistics and Back: High-Dimensional Models and Processes, 9:128–149, 2013.
- Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
- The curse of overparametrization in adversarial training: Precise analysis of robust generalization for random features regression. arXiv preprint arXiv:2201.05149, 2022.
- Strong uniform consistency rates for estimators of conditional functionals. The Annals of Statistics, pages 1428–1449, 1988.
- Nearly-tight vc-dimension bounds for piecewise linear neural networks. In Conference on learning theory, pages 1064–1068. PMLR, 2017.
- On adaptive posterior concentration rates. The Annals of Statistics, 43(5):2259–2295, 2015.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
- Deep neural networks learn non-smooth functions effectively. In The 22nd international conference on artificial intelligence and statistics, pages 869–878. PMLR, 2019.
- Advantage of deep neural networks for estimating functions with singularity on hypersurfaces. Journal of Machine Learning Research, 23:1–54, 2022.
- Heinrich Jiang. Non-asymptotic uniform rates of consistency for k-nn regression. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 3999–4006, 2019.
- Precise tradeoffs in adversarial training for linear regression. In Conference on Learning Theory, pages 2034–2078. PMLR, 2020.
- Deep nonparametric regression on approximately low-dimensional manifolds. arXiv preprint arXiv:2104.06708, 2021.
- Adversarial machine learning at scale. arXiv preprint arXiv:1611.01236, 2016.
- Adversarial risk bounds via function transformation. arXiv preprint arXiv:1810.09519, 2018.
- On the rate of convergence of fully connected deep neural network regression estimates. The Annals of Statistics, 49(4):2231–2249, 2021.
- Imagenet classification with deep convolutional neural networks. Communications of the ACM, 60(6):84–90, 2017.
- Minimax theory of image reconstruction, volume 82. Springer Science & Business Media, 2012.
- DA Labutin. Integral representations of functions and embeddings of sobolev spaces on cuspidal domains. Mathematical Notes, 61:164–179, 1997.
- Deep learning. nature, 521(7553):436–444, 2015.
- Multilayer feedforward networks with a nonpolynomial activation function can approximate any function. Neural networks, 6(6):861–867, 1993.
- The expressive power of neural networks: A view from the width. Advances in neural information processing systems, 30, 2017.
- Deep network approximation for smooth functions. SIAM Journal on Mathematical Analysis, 53(5):5465–5506, 2021.
- Elias Masry. Wavelet-based estimation of multivariate regression functions in besov spaces. Journal of Nonparametric Statistics, 12(2):283–308, 2000.
- Vc classes are adversarially robustly learnable, but only improperly. In Conference on Learning Theory, pages 2512–2530. PMLR, 2019.
- Fundamental tradeoffs in distributionally adversarial training. In International Conference on Machine Learning, pages 7544–7554. PMLR, 2021.
- Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, 2018.
- Whitney K Newey. Convergence rates and asymptotic normality for series estimators. Journal of econometrics, 79(1):147–168, 1997.
- Adaptive approximation and generalization of deep neural network with intrinsic dimensionality. J. Mach. Learn. Res., 21(174):1–38, 2020.
- Emanuel Parzen. On estimation of a probability density function and mode. The annals of mathematical statistics, 33(3):1065–1076, 1962.
- Jack Peetre. New thoughts on besov spaces, duke univ. Math. Ser, 1:45–55, 1976.
- Distillation as a defense to adversarial perturbations against deep neural networks. In 2016 IEEE symposium on security and privacy (SP), pages 582–597. IEEE, 2016.
- Adversarial training is a form of data-dependent operator norm regularization. Advances in Neural Information Processing Systems, 33:14973–14985, 2020.
- Overparameterized linear regression under adversarial attacks. arXiv preprint arXiv:2204.06274, 2022.
- Larry Schumaker. Spline functions: basic theory. Cambridge university press, 2007.
- Johannes Schmidt-Hieber. Deep relu network approximation of functions on a manifold. arXiv preprint arXiv:1908.00695, 2019.
- Johannes Schmidt-Hieber. Nonparametric regression using deep neural networks with relu activation function. The Annals of Statistics, 48(4):1875–1897, 2020.
- Local convergence rates of the least squares estimator with applications to transfer learning. arXiv preprint arXiv:2204.05003, 2022.
- Bernard W Silverman. Weak and strong uniform consistency of the kernel estimate of a density and its derivatives. The Annals of Statistics, pages 177–184, 1978.
- Deep quantile regression: Mitigating the curse of dimensionality through composition. arXiv preprint arXiv:2107.04907, 2021.
- Robust nonparametric regression with deep neural networks. arXiv preprint arXiv:2107.10343, 2021.
- Deep learning is adaptive to intrinsic dimensionality of model smoothness in anisotropic besov space. Advances in Neural Information Processing Systems, 34:3609–3621, 2021.
- Kyungchul Song. Uniform convergence of series estimators over function spaces. Econometric Theory, 24(6):1463–1499, 2008.
- Adversarially robust generalization requires more data. Advances in neural information processing systems, 31, 2018.
- Charles J Stone. Optimal rates of convergence for nonparametric estimators. The annals of Statistics, pages 1348–1360, 1980.
- Charles J Stone. Optimal global rates of convergence for nonparametric regression. The annals of statistics, pages 1040–1053, 1982.
- Taiji Suzuki. Adaptivity of deep relu network for learning in besov and mixed smooth besov spaces: optimal rate and curse of dimensionality. arXiv preprint arXiv:1810.08033, 2018.
- The space of transferable adversarial examples. arXiv preprint arXiv:1704.03453, 2017.
- Estimation error analysis of deep learning on the regression problem on the variable exponent besov space. Electronic Journal of Statistics, 15(1):1869–1908, 2021.
- Alexandre B Tsybakov. Introduction to nonparametric estimation. Springer Science & Business Media, 2008.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Weak convergence. In Weak convergence and empirical processes, pages 16–28. Springer, 1996.
- On the convergence and robustness of adversarial training. arXiv preprint arXiv:2112.08304, 2021.
- Improving adversarial robustness requires revisiting misclassified examples. In International Conference on Learning Representations, 2019.
- Frequentist coverage and sup-norm convergence rate in gaussian process regression. arXiv preprint arXiv:1708.04753, 2017.
- Supremum norm posterior contraction and credible sets for nonparametric multivariate regression. The Annals of Statistics, 44(3):1069–1102, 2016.
- Rademacher complexity for adversarially robust generalization. In International conference on machine learning, pages 7085–7094. PMLR, 2019.
- Adversarially robust generalization just requires more unlabeled data. arXiv preprint arXiv:1906.00555, 2019.
- Wavelet threshold estimation of a regression function with random design. Journal of multivariate analysis, 80(2):256–284, 2002.
- Theoretically principled trade-off between robustness and accuracy. In International conference on machine learning, pages 7472–7482. PMLR, 2019.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.