Refined Risk Bounds for Unbounded Losses via Transductive Priors (2410.21621v2)
Abstract: We revisit the sequential variants of linear regression with the squared loss, classification problems with hinge loss, and logistic regression, all characterized by unbounded losses in the setup where no assumptions are made on the magnitude of design vectors and the norm of the optimal vector of parameters. The key distinction from existing results lies in our assumption that the set of design vectors is known in advance (though their order is not), a setup sometimes referred to as transductive online learning. While this assumption seems similar to fixed design regression or denoising, we demonstrate that the sequential nature of our algorithms allows us to convert our bounds into statistical ones with random design without making any additional assumptions about the distribution of the design vectors--an impossibility for standard denoising results. Our key tools are based on the exponential weights algorithm with carefully chosen transductive (design-dependent) priors, which exploit the full horizon of the design vectors. Our classification regret bounds have a feature that is only attributed to bounded losses in the literature: they depend solely on the dimension of the parameter space and on the number of rounds, independent of the design vectors or the norm of the optimal solution. For linear regression with squared loss, we further extend our analysis to the sparse case, providing sparsity regret bounds that additionally depend on the magnitude of the response variables. We argue that these improved bounds are specific to the transductive setting and unattainable in the worst-case sequential setup. Our algorithms, in several cases, have polynomial time approximations and reduce to sampling with respect to log-concave measures instead of aggregating over hard-to-construct $\varepsilon$-covers of classes.
- Pierre Alquier. User-friendly introduction to PAC-Bayes bounds. Foundations and Trends® in Machine Learning, 17(2):174–303, 2024.
- On the properties of variational approximations of Gibbs posteriors. Journal of Machine Learning Research, 17(236):1–41, 2016.
- Jean-Yves Audibert. PAC-Bayesian statistical learning theory. PhD thesis, Université Paris VI, 2004.
- Relative loss bounds for on-line density estimation with the exponential family of distributions. Machine Learning, 43(3):211–246, 2001.
- Francis Bach. Self-concordant analysis for logistic regression. Electronic Journal of Statistics, 4:384–414, 2010.
- Andrew R. Barron. Information-theoretic characterization of Bayes performance and the choice of priors in parametric and nonparametric problems. In Bayesian Statistics 6: Proceedings of the Sixth Valencia International Meeting, June 6-10, 1998, pages 27–52. Oxford University Press, 1999.
- Minimax fixed-design linear regression. In Conference on Learning Theory, pages 226–239. PMLR, 2015.
- Online learning versus offline learning. Machine Learning, 29:45–63, 1997.
- Minimax rates for conditional density estimation via empirical entropy. The Annals of Statistics, 51(2):762–790, 2023.
- Learnability and the Vapnik-Chervonenkis dimension. Journal of the ACM (JACM), 36(4):929–965, 1989.
- A training algorithm for optimal margin classifiers. In Proceedings of the fifth annual workshop on Computational learning theory, pages 144–152, 1992.
- Proper learning, helly number, and an optimal svm bound. In Conference on Learning Theory, pages 582–609. PMLR, 2020.
- Normalizing constants of log-concave densities. 2018.
- Bayesian linear regression with sparse priors. The Annals of Statistics, 43(5):1986–2018, 2015.
- Olivier Catoni. PAC-Bayesian Supervised Classification: The Thermodynamics of Statistical Learning, volume 56 of IMS Lecture Notes Monograph Series. Institute of Mathematical Statistics, 2007.
- Prediction, Learning, and Games. Cambridge University Press, 2006.
- Efficient transductive online learning via randomized rounding. Empirical Inference: Festschrift in Honor of Vladimir N. Vapnik, pages 177–194, 2013.
- A second-order perceptron algorithm. SIAM Journal on Computing, 34(3):640–668, 2005.
- Information-theoretic asymptotics of Bayes methods. IEEE Transactions on Information Theory, 36(3):453–471, 1990.
- Jeffreys’ prior is asymptotically least favorable under entropy risk. Journal of Statistical planning and Inference, 41(1):37–60, 1994.
- Thomas M Cover. Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE transactions on electronic computers, (3):326–334, 1965.
- Aggregation by exponential weighting, sharp PAC-Bayesian bounds and sparsity. Machine Learning, 72(1):39–61, 2008.
- Arnak S Dalalyan. Theoretical guarantees for approximate sampling from smooth and log-concave densities. Journal of the Royal Statistical Society Series B: Statistical Methodology, 79(3):651–676, 2017.
- Mirror averaging with sparsity priors. Bernoulli, 18(3):914–944, 2012a.
- Sparse regression learning by aggregation and Langevin Monte-Carlo. Journal of Computer and System Sciences, 78(5):1423–1443, 2012b.
- Precise regularized minimax regret with unbounded weights. 2024a.
- Minimax regret with unbounded weights. In ISIT 2024-International Symposium on Information Theory, 2024b.
- Jürgen Forster. On relative loss bounds in generalized linear regression. In Fundamentals of Computation Theory: 12th International Symposium, pages 269–280, 1999.
- Relative expected instantaneous loss bounds. Journal of Computer and System Sciences, 64(1):76–102, 2002.
- Logistic regression: The importance of being improper. In Conference on Learning Theory, pages 167–208, 2018.
- A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119–139, 1997.
- Uniform regret bounds over ℝdsuperscriptℝ𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT for the sequential linear regression problem with the square loss. In Algorithmic Learning Theory, pages 404–432. PMLR, 2019.
- Estimating normalizing constants for log-concave distributions: Algorithms and lower bounds. In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, pages 579–586, 2020.
- Sébastien Gerchinovitz. Sparsity regret bounds for individual sequences in online linear regression. Journal of Machine Learning Research, 14:729–769, 2013.
- Bayes and Tukey meet at the center point. In 17th Annual Conference on Learning Theory, COLT 2004, pages 549–563, 2004.
- PAC-Bayes, MAC-Bayes and conditional mutual information: Fast rate bounds that handle general VC classes. In Conference on Learning Theory, pages 2217–2247. PMLR, 2021.
- A distribution-free theory of nonparametric regression, volume 1. Springer, 2002.
- Logistic regression: Tight bounds for stochastic and online optimization. In Conference on Learning Theory, pages 197–209. PMLR, 2014.
- On the sample complexity of parameter estimation in logistic regression with normal design. In The Thirty Seventh Annual Conference on Learning Theory, pages 2418–2437, 2024.
- Precise minimax regret for logistic regression. In 2022 IEEE International Symposium on Information Theory (ISIT), pages 444–449. IEEE, 2022.
- Efficient improper learning for online logistic regression. In Conference on Learning Theory, pages 2085–2108. PMLR, 2020.
- Mixability made efficient: Fast online multiclass logistic regression. Advances in Neural Information Processing Systems, 34:23692–23702, 2021.
- From batch to transductive online learning. Advances in Neural Information Processing Systems, 18, 2005.
- Online bounds for Bayesian algorithms. Advances in Neural Information Processing Systems, 17, 2004.
- Averaging expert predictions. In European Conference on Computational Learning Theory, pages 153–167. Springer, 1999.
- The performance of universal encoding. IEEE Transactions on Information Theory, 27(2):199–207, 1981.
- Felix Kuchelmeister and Sara van de Geer. Finite sample rates for logistic regression with small noise or few samples. Sankhya A, pages 1–70, 2024.
- Bandit Algorithms. Cambridge University Press, 2020.
- Nick Littlestone. Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine learning, 2:285–318, 1988.
- The weighted majority algorithm. Information and Computation, 108(2):212–261, 1994.
- On the complexity of proper distribution-free learning of linear classifiers. In Algorithmic Learning Theory, pages 583–591. PMLR, 2020.
- The Tien Mai. High-dimensional sparse classification using exponential weighting with empirical hinge loss. Statistica Neerlandica, 2024.
- Open problem: Better bounds for online logistic regression. In Conference on Learning Theory, pages 44–1. JMLR Workshop and Conference Proceedings, 2012.
- An improper estimator with optimal excess risk in misspecified density estimation and logistic regression. Journal of Machine Learning Research, 23(31):1–49, 2022.
- Distribution-free robust linear regression. Mathematical Statistics and Learning, 4(3):253–292, 2022.
- Local risk bounds for statistical aggregation. In The Thirty Sixth Annual Conference on Learning Theory, pages 5697–5698. PMLR, 2023.
- Albert B. Novikoff. On convergence proofs on perceptrons. Proceedings of the Symposium on the Mathematical Theory of Automata, 12:615–622, 1962.
- Finite-sample analysis of M𝑀{M}italic_M-estimators using self-concordance. Electronic Journal of Statistics, 15(1):326–391, 2021.
- High-dimensional estimation with geometric constraints. Information and Inference: A Journal of the IMA, 6(1):1–40, 2017.
- Information Theory: From Coding to Learning. Cambridge University Press, 2024.
- Exploring local norms in exp-concave statistical learning. In The Thirty Sixth Annual Conference on Learning Theory, pages 1993–2013, 2023.
- Variational bayes for high-dimensional linear regression with sparse priors. Journal of the American Statistical Association, 117(539):1270–1281, 2022.
- Exponential screening and optimal rates of sparse estimation. 2011.
- Sparse estimation by exponential weighting. Statistical Science, 27(4):558–575, 2012.
- Jorma J Rissanen. Fisher information and stochastic complexity. IEEE Transactions on Information Theory, 42(1):40–47, 1996.
- Monte Carlo Statistical Methods. Springer Texts in Statistics. 1st edition, 1999.
- Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, 2014.
- Gil I Shamir. Logistic regression regret: What’s the catch? In Conference on Learning Theory, pages 3296–3319. PMLR, 2020.
- Yurii Shtarkov. Universal sequential coding of single messages. Problemy Peredachi Informatsii, 23(3):3–17, 1987.
- High-probability risk bounds via sequential predictors. arXiv preprint arXiv:2308.07588, 2023.
- Fast rates in statistical and online learning. The Journal of Machine Learning Research, 16(1):1793–1861, 2015.
- Vladimir Vapnik. The Nature of Statistical Learning Theory. Springer science & business media, 2013.
- Theory of pattern recognition, 1974.
- Suboptimality of constrained least squares and improvements via non-linear predictors. Bernoulli, 29(1):473–495, 2023.
- Suhas Vijaykumar. Localization, convexity, and star aggregation. Advances in Neural Information Processing Systems, 34:4570–4581, 2021.
- Vladimir Vovk. Metric entropy in competitive on-line prediction. arXiv preprint cs/0609045, 2006.
- Volodimir Vovk. Aggregating strategies. In Proceedings of the Third Annual Workshop on Computational Learning Theory, pages 371–386. ACM, 1990.
- Volodya Vovk. Competitive on-line statistics. International Statistical Review, 69(2):213–248, 2001.
- Sequential vs. fixed design regrets in online learning. In 2022 IEEE International Symposium on Information Theory (ISIT), pages 438–443. IEEE, 2022.
- Information-theoretic determination of minimax rates of convergence. Annals of Statistics, pages 1564–1599, 1999.
- An identity for kernel ridge regression. Theoretical Computer Science, 473:157–178, 2013.
- Competing with gaussian linear experts. arXiv preprint arXiv:0910.4683, 2009.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.