On the Performance of Empirical Risk Minimization with Smoothed Data (2402.14987v1)
Abstract: In order to circumvent statistical and computational hardness results in sequential decision-making, recent work has considered smoothed online learning, where the distribution of data at each time is assumed to have bounded likeliehood ratio with respect to a base measure when conditioned on the history. While previous works have demonstrated the benefits of smoothness, they have either assumed that the base measure is known to the learner or have presented computationally inefficient algorithms applying only in special cases. This work investigates the more general setting where the base measure is \emph{unknown} to the learner, focusing in particular on the performance of Empirical Risk Minimization (ERM) with square loss when the data are well-specified and smooth. We show that in this setting, ERM is able to achieve sublinear error whenever a class is learnable with iid data; in particular, ERM achieves error scaling as $\tilde O( \sqrt{\mathrm{comp}(\mathcal F)\cdot T} )$, where $\mathrm{comp}(\mathcal F)$ is the statistical complexity of learning $\mathcal F$ with iid data. In so doing, we prove a novel norm comparison bound for smoothed data that comprises the first sharp norm comparison for dependent data applying to arbitrary, nonlinear function classes. We complement these results with a lower bound indicating that our analysis of ERM is essentially tight, establishing a separation in the performance of ERM between smoothed and iid data.
- Improved algorithms for linear stochastic bandits. Advances in neural information processing systems, 24, 2011.
- Kazuoki Azuma. Weighted sums of certain dependent random variables. Tohoku Mathematical Journal, Second Series, 19(3):357–367, 1967.
- Agnostic online learning. 2009.
- Smoothed analysis of sequential probability assignment. arXiv preprint arXiv:2303.04845, 2023.
- Rates of convergence for minimum contrast estimators. Probability Theory and Related Fields, 97:113–150, 1993.
- The sample complexity of approximate rejection sampling with applications to smoothed online learning. In Gergely Neu and Lorenzo Rosasco, editors, Proceedings of Thirty Sixth Conference on Learning Theory, volume 195 of Proceedings of Machine Learning Research, pages 228–273. PMLR, 12–15 Jul 2023. URL https://proceedings.mlr.press/v195/block23a.html.
- Efficient and near-optimal smoothed online learning for generalized linear functions. Advances in Neural Information Processing Systems, 35:7477–7489, 2022.
- Smoothed online learning is as easy as statistical learning. In Conference on Learning Theory, pages 1716–1786. PMLR, 2022.
- Oracle-efficient smoothed online learning for piecewise continuous decision making. In Gergely Neu and Lorenzo Rosasco, editors, Proceedings of Thirty Sixth Conference on Learning Theory, volume 195 of Proceedings of Machine Learning Research, pages 1618–1665. PMLR, 12–15 Jul 2023a. URL https://proceedings.mlr.press/v195/block23b.html.
- Smoothed online learning for prediction in piecewise affine systems. In Advances in Neural Information Processing Systems. Curran Associates, Inc., 2023b. URL https://openreview.net/pdf?id=Izt7rDD7jN.
- Olivier Bousquet. Concentration inequalities and empirical processes theory applied to the analysis of learning algorithms. PhD thesis, École Polytechnique: Department of Applied Mathematics Paris, France, 2002.
- Prediction, learning, and games. Cambridge university press, 2006.
- Repeated bilateral trade against a smoothed adversary. In The Thirty Sixth Annual Conference on Learning Theory, pages 1095–1130. PMLR, 2023.
- Victor De la Pena and Evarist Giné. Decoupling: from dependence to independence. Springer, 1999.
- Richard M Dudley. Central limit theorems for empirical measures. The Annals of Probability, pages 899–929, 1978.
- Smoothed analysis of online non-parametric auctions. In Proceedings of the 24th ACM Conference on Economics and Computation, pages 540–560, 2023.
- Beyond ucb: Optimal and efficient contextual bandits with regression oracles. In International Conference on Machine Learning, pages 3199–3210. PMLR, 2020.
- Instance-dependent complexity of contextual bandits and reinforcement learning: A disagreement-based perspective. In Conference on Learning Theory, pages 2059–2059. PMLR, 2021a.
- Efficient first-order contextual bandits: Prediction, allocation, and triangular discrimination. Advances in Neural Information Processing Systems, 34:18907–18919, 2021.
- The statistical complexity of interactive decision making. arXiv preprint arXiv:2112.13487, 2021b.
- Tight guarantees for interactive decision making with the decision-estimation coefficient. arXiv preprint arXiv:2301.08215, 2023.
- Deep learning. MIT press, 2016.
- Hugo Hadwiger. Das will’sche funktional. Monatshefte für Mathematik, 79(3):213–221, 1975.
- Smoothed analysis of online and differentially private learning. Advances in Neural Information Processing Systems, 33:9203–9215, 2020.
- Oracle-efficient online learning for beyond worst-case adversaries. arXiv e-prints, pages arXiv–2202, 2022a.
- Smoothed analysis with adaptive adversaries. In 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS), pages 942–953. IEEE, 2022b.
- The computational power of optimization in online learning. In Proceedings of the forty-eighth annual ACM symposium on Theory of Computing, pages 128–141, 2016.
- Efficient algorithms for online decision problems. Journal of Computer and System Sciences, 71(3):291–307, 2005.
- Bounding the smallest singular value of a random matrix without concentration. International Mathematics Research Notices, 2015(23):12991–13008, 2015.
- Gil Kur. On The Performance Of The Maximum Likelihood Over Large Models. PhD thesis, Massachusetts Institute of Technology, 2023.
- Bandit algorithms. Cambridge University Press, 2020.
- Learning with square loss: Localization through offset rademacher complexity. In Conference on Learning Theory, pages 1260–1285. PMLR, 2015.
- Nick Littlestone. Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine learning, 2:285–318, 1988.
- Shahar Mendelson. Learning without concentration. Journal of the ACM (JACM), 62(3):1–25, 2015.
- Shahar Mendelson. Extending the scope of the small-ball method. Studia Mathematica, 256:147–167, 2021.
- Entropy and the combinatorial dimension. Inventiones mathematicae, 152(1):37–55, 2003.
- Foundations of machine learning. MIT press, 2018.
- Jaouad Mourtada. Universal coding, intrinsic volumes, and metric complexity. arXiv preprint arXiv:2303.07279, 2023.
- Online non-parametric regression. In Conference on Learning Theory, pages 1232–1264. PMLR, 2014.
- Online learning: Stochastic, constrained, and smoothed adversaries. Advances in neural information processing systems, 24, 2011.
- Sequential complexities and uniform martingale laws of large numbers. Probability theory and related fields, 161:111–153, 2015.
- Empirical entropy, minimax regret and minimax risk. Bernoulli, 23(2):789–824, 2017.
- Walter Rudin et al. Principles of mathematical analysis, volume 3. McGraw-hill New York, 1976.
- Learning without mixing: Towards a sharp analysis of linear system identification. In Conference On Learning Theory, pages 439–473. PMLR, 2018.
- Learning from many trajectories. arXiv preprint arXiv:2203.17193, 2022.
- Ramon Van Handel. Probability in high dimension. Lecture Notes (Princeton University), 2014.
- Vladimir N Vapnik. An overview of statistical learning theory. IEEE transactions on neural networks, 10(5):988–999, 1999.
- Richard A Vitale. The wills functional and gaussian processes. The Annals of Probability, 24(4):2172–2178, 1996.
- Martin J Wainwright. High-dimensional statistics: A non-asymptotic viewpoint, volume 48. Cambridge university press, 2019.
- Jörg M Wills. Zur gitterpunktanzahl konvexer mengen. Elemente der Mathematik, 28:57–63, 1973.
- Online learning in dynamically changing environments. In Gergely Neu and Lorenzo Rosasco, editors, Proceedings of Thirty Sixth Conference on Learning Theory, volume 195 of Proceedings of Machine Learning Research, pages 325–358. PMLR, 12–15 Jul 2023. URL https://proceedings.mlr.press/v195/wu23a.html.
- The role of coverage in online reinforcement learning. In The Eleventh International Conference on Learning Representations, 2022.
- Information-theoretic determination of minimax rates of convergence. Annals of Statistics, pages 1564–1599, 1999.
- Learning with little mixing. Advances in Neural Information Processing Systems, 35:4626–4637, 2022.