Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 71 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 23 tok/s Pro
GPT-5 High 17 tok/s Pro
GPT-4o 111 tok/s Pro
Kimi K2 161 tok/s Pro
GPT OSS 120B 412 tok/s Pro
Claude Sonnet 4 35 tok/s Pro
2000 character limit reached

Refined Risk Bounds for Unbounded Losses via Transductive Priors (2410.21621v2)

Published 29 Oct 2024 in stat.ML, cs.LG, math.ST, and stat.TH

Abstract: We revisit the sequential variants of linear regression with the squared loss, classification problems with hinge loss, and logistic regression, all characterized by unbounded losses in the setup where no assumptions are made on the magnitude of design vectors and the norm of the optimal vector of parameters. The key distinction from existing results lies in our assumption that the set of design vectors is known in advance (though their order is not), a setup sometimes referred to as transductive online learning. While this assumption seems similar to fixed design regression or denoising, we demonstrate that the sequential nature of our algorithms allows us to convert our bounds into statistical ones with random design without making any additional assumptions about the distribution of the design vectors--an impossibility for standard denoising results. Our key tools are based on the exponential weights algorithm with carefully chosen transductive (design-dependent) priors, which exploit the full horizon of the design vectors. Our classification regret bounds have a feature that is only attributed to bounded losses in the literature: they depend solely on the dimension of the parameter space and on the number of rounds, independent of the design vectors or the norm of the optimal solution. For linear regression with squared loss, we further extend our analysis to the sparse case, providing sparsity regret bounds that additionally depend on the magnitude of the response variables. We argue that these improved bounds are specific to the transductive setting and unattainable in the worst-case sequential setup. Our algorithms, in several cases, have polynomial time approximations and reduce to sampling with respect to log-concave measures instead of aggregating over hard-to-construct $\varepsilon$-covers of classes.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (82)
  1. Pierre Alquier. User-friendly introduction to PAC-Bayes bounds. Foundations and Trends® in Machine Learning, 17(2):174–303, 2024.
  2. On the properties of variational approximations of Gibbs posteriors. Journal of Machine Learning Research, 17(236):1–41, 2016.
  3. Jean-Yves Audibert. PAC-Bayesian statistical learning theory. PhD thesis, Université Paris VI, 2004.
  4. Relative loss bounds for on-line density estimation with the exponential family of distributions. Machine Learning, 43(3):211–246, 2001.
  5. Francis Bach. Self-concordant analysis for logistic regression. Electronic Journal of Statistics, 4:384–414, 2010.
  6. Andrew R. Barron. Information-theoretic characterization of Bayes performance and the choice of priors in parametric and nonparametric problems. In Bayesian Statistics 6: Proceedings of the Sixth Valencia International Meeting, June 6-10, 1998, pages 27–52. Oxford University Press, 1999.
  7. Minimax fixed-design linear regression. In Conference on Learning Theory, pages 226–239. PMLR, 2015.
  8. Online learning versus offline learning. Machine Learning, 29:45–63, 1997.
  9. Minimax rates for conditional density estimation via empirical entropy. The Annals of Statistics, 51(2):762–790, 2023.
  10. Learnability and the Vapnik-Chervonenkis dimension. Journal of the ACM (JACM), 36(4):929–965, 1989.
  11. A training algorithm for optimal margin classifiers. In Proceedings of the fifth annual workshop on Computational learning theory, pages 144–152, 1992.
  12. Proper learning, helly number, and an optimal svm bound. In Conference on Learning Theory, pages 582–609. PMLR, 2020.
  13. Normalizing constants of log-concave densities. 2018.
  14. Bayesian linear regression with sparse priors. The Annals of Statistics, 43(5):1986–2018, 2015.
  15. Olivier Catoni. PAC-Bayesian Supervised Classification: The Thermodynamics of Statistical Learning, volume 56 of IMS Lecture Notes Monograph Series. Institute of Mathematical Statistics, 2007.
  16. Prediction, Learning, and Games. Cambridge University Press, 2006.
  17. Efficient transductive online learning via randomized rounding. Empirical Inference: Festschrift in Honor of Vladimir N. Vapnik, pages 177–194, 2013.
  18. A second-order perceptron algorithm. SIAM Journal on Computing, 34(3):640–668, 2005.
  19. Information-theoretic asymptotics of Bayes methods. IEEE Transactions on Information Theory, 36(3):453–471, 1990.
  20. Jeffreys’ prior is asymptotically least favorable under entropy risk. Journal of Statistical planning and Inference, 41(1):37–60, 1994.
  21. Thomas M Cover. Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE transactions on electronic computers, (3):326–334, 1965.
  22. Aggregation by exponential weighting, sharp PAC-Bayesian bounds and sparsity. Machine Learning, 72(1):39–61, 2008.
  23. Arnak S Dalalyan. Theoretical guarantees for approximate sampling from smooth and log-concave densities. Journal of the Royal Statistical Society Series B: Statistical Methodology, 79(3):651–676, 2017.
  24. Mirror averaging with sparsity priors. Bernoulli, 18(3):914–944, 2012a.
  25. Sparse regression learning by aggregation and Langevin Monte-Carlo. Journal of Computer and System Sciences, 78(5):1423–1443, 2012b.
  26. Precise regularized minimax regret with unbounded weights. 2024a.
  27. Minimax regret with unbounded weights. In ISIT 2024-International Symposium on Information Theory, 2024b.
  28. Jürgen Forster. On relative loss bounds in generalized linear regression. In Fundamentals of Computation Theory: 12th International Symposium, pages 269–280, 1999.
  29. Relative expected instantaneous loss bounds. Journal of Computer and System Sciences, 64(1):76–102, 2002.
  30. Logistic regression: The importance of being improper. In Conference on Learning Theory, pages 167–208, 2018.
  31. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119–139, 1997.
  32. Uniform regret bounds over ℝdsuperscriptℝ𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT for the sequential linear regression problem with the square loss. In Algorithmic Learning Theory, pages 404–432. PMLR, 2019.
  33. Estimating normalizing constants for log-concave distributions: Algorithms and lower bounds. In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, pages 579–586, 2020.
  34. Sébastien Gerchinovitz. Sparsity regret bounds for individual sequences in online linear regression. Journal of Machine Learning Research, 14:729–769, 2013.
  35. Bayes and Tukey meet at the center point. In 17th Annual Conference on Learning Theory, COLT 2004, pages 549–563, 2004.
  36. PAC-Bayes, MAC-Bayes and conditional mutual information: Fast rate bounds that handle general VC classes. In Conference on Learning Theory, pages 2217–2247. PMLR, 2021.
  37. A distribution-free theory of nonparametric regression, volume 1. Springer, 2002.
  38. Logistic regression: Tight bounds for stochastic and online optimization. In Conference on Learning Theory, pages 197–209. PMLR, 2014.
  39. On the sample complexity of parameter estimation in logistic regression with normal design. In The Thirty Seventh Annual Conference on Learning Theory, pages 2418–2437, 2024.
  40. Precise minimax regret for logistic regression. In 2022 IEEE International Symposium on Information Theory (ISIT), pages 444–449. IEEE, 2022.
  41. Efficient improper learning for online logistic regression. In Conference on Learning Theory, pages 2085–2108. PMLR, 2020.
  42. Mixability made efficient: Fast online multiclass logistic regression. Advances in Neural Information Processing Systems, 34:23692–23702, 2021.
  43. From batch to transductive online learning. Advances in Neural Information Processing Systems, 18, 2005.
  44. Online bounds for Bayesian algorithms. Advances in Neural Information Processing Systems, 17, 2004.
  45. Averaging expert predictions. In European Conference on Computational Learning Theory, pages 153–167. Springer, 1999.
  46. The performance of universal encoding. IEEE Transactions on Information Theory, 27(2):199–207, 1981.
  47. Felix Kuchelmeister and Sara van de Geer. Finite sample rates for logistic regression with small noise or few samples. Sankhya A, pages 1–70, 2024.
  48. Bandit Algorithms. Cambridge University Press, 2020.
  49. Nick Littlestone. Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine learning, 2:285–318, 1988.
  50. The weighted majority algorithm. Information and Computation, 108(2):212–261, 1994.
  51. On the complexity of proper distribution-free learning of linear classifiers. In Algorithmic Learning Theory, pages 583–591. PMLR, 2020.
  52. The Tien Mai. High-dimensional sparse classification using exponential weighting with empirical hinge loss. Statistica Neerlandica, 2024.
  53. Open problem: Better bounds for online logistic regression. In Conference on Learning Theory, pages 44–1. JMLR Workshop and Conference Proceedings, 2012.
  54. An improper estimator with optimal excess risk in misspecified density estimation and logistic regression. Journal of Machine Learning Research, 23(31):1–49, 2022.
  55. Distribution-free robust linear regression. Mathematical Statistics and Learning, 4(3):253–292, 2022.
  56. Local risk bounds for statistical aggregation. In The Thirty Sixth Annual Conference on Learning Theory, pages 5697–5698. PMLR, 2023.
  57. Albert B. Novikoff. On convergence proofs on perceptrons. Proceedings of the Symposium on the Mathematical Theory of Automata, 12:615–622, 1962.
  58. Finite-sample analysis of M𝑀{M}italic_M-estimators using self-concordance. Electronic Journal of Statistics, 15(1):326–391, 2021.
  59. High-dimensional estimation with geometric constraints. Information and Inference: A Journal of the IMA, 6(1):1–40, 2017.
  60. Information Theory: From Coding to Learning. Cambridge University Press, 2024.
  61. Exploring local norms in exp-concave statistical learning. In The Thirty Sixth Annual Conference on Learning Theory, pages 1993–2013, 2023.
  62. Variational bayes for high-dimensional linear regression with sparse priors. Journal of the American Statistical Association, 117(539):1270–1281, 2022.
  63. Exponential screening and optimal rates of sparse estimation. 2011.
  64. Sparse estimation by exponential weighting. Statistical Science, 27(4):558–575, 2012.
  65. Jorma J Rissanen. Fisher information and stochastic complexity. IEEE Transactions on Information Theory, 42(1):40–47, 1996.
  66. Monte Carlo Statistical Methods. Springer Texts in Statistics. 1st edition, 1999.
  67. Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, 2014.
  68. Gil I Shamir. Logistic regression regret: What’s the catch? In Conference on Learning Theory, pages 3296–3319. PMLR, 2020.
  69. Yurii Shtarkov. Universal sequential coding of single messages. Problemy Peredachi Informatsii, 23(3):3–17, 1987.
  70. High-probability risk bounds via sequential predictors. arXiv preprint arXiv:2308.07588, 2023.
  71. Fast rates in statistical and online learning. The Journal of Machine Learning Research, 16(1):1793–1861, 2015.
  72. Vladimir Vapnik. The Nature of Statistical Learning Theory. Springer science & business media, 2013.
  73. Theory of pattern recognition, 1974.
  74. Suboptimality of constrained least squares and improvements via non-linear predictors. Bernoulli, 29(1):473–495, 2023.
  75. Suhas Vijaykumar. Localization, convexity, and star aggregation. Advances in Neural Information Processing Systems, 34:4570–4581, 2021.
  76. Vladimir Vovk. Metric entropy in competitive on-line prediction. arXiv preprint cs/0609045, 2006.
  77. Volodimir Vovk. Aggregating strategies. In Proceedings of the Third Annual Workshop on Computational Learning Theory, pages 371–386. ACM, 1990.
  78. Volodya Vovk. Competitive on-line statistics. International Statistical Review, 69(2):213–248, 2001.
  79. Sequential vs. fixed design regrets in online learning. In 2022 IEEE International Symposium on Information Theory (ISIT), pages 438–443. IEEE, 2022.
  80. Information-theoretic determination of minimax rates of convergence. Annals of Statistics, pages 1564–1599, 1999.
  81. An identity for kernel ridge regression. Theoretical Computer Science, 473:157–178, 2013.
  82. Competing with gaussian linear experts. arXiv preprint arXiv:0910.4683, 2009.

Summary

  • The paper introduces transductive priors to achieve improved risk bounds for unbounded losses in prediction tasks.
  • It refines regret bounds in logistic and linear regression by eliminating dependence on bounded norm assumptions while ensuring computational efficiency.
  • It distinguishes transductive from inductive setups, demonstrating significant improvements in sequential decision-making under uncertainty.

Refined Risk Bounds for Unbounded Losses via Transductive Priors: An Overview

This paper addresses the challenge of providing refined risk bounds for prediction tasks involving unbounded losses, specifically focusing on tasks such as linear regression, logistic regression, and classification with hinge loss. The paper's primary contribution is identifying that by considering a transductive framework—where the order of design vectors is unknown, but their set is known—improved risk bounds can be achieved. This refinement is applicable even in cases of unbounded losses, which traditionally pose significant hurdles in deriving efficient risk bounds.

The authors introduce the concept of transductive online learning, which stands apart from traditional online learning frameworks where sequences are revealed in a particular order, or batch learning setups, where the entire dataset is available ahead of time. The novel approach presented leverages the prior knowledge about the set of design vectors to form transductive priors, which are pivotal in achieving the stated improvements in risk bounds.

Key Contributions

  1. Transductive Priors and Exponential Weights: The paper introduces transductive priors, which are design-dependent priors used within the exponential weights algorithm framework. By incorporating the complete set of design vectors into the prior, the algorithm can leverage additional information, enhancing prediction accuracy and risk bounds beyond traditional methods.
  2. Refinement of Regret Bounds: For logistic regression, the approach yields regret bounds that are independent of the norm of the optimal solution. This contrasts with existing methods, which typically require bounded norms. Additionally, for linear regression, including sparse cases, the authors not only improve the regret bounds but also maintain computational efficiency through log-concave sampling.
  3. Separation of Transductive and Inductive Setups: The work demonstrates clear separations in learnability when applying transductive setups versus traditional online learning frameworks. The results indicate scenarios where transductive learning methods provide sublinear regret bounds, highlighting their enhanced capability in handling unbounded losses.
  4. Computational Efficiency and Practicality: While maintaining theoretical rigor, the algorithms developed exhibit polynomial time computation in many instances, another key advantage over conventional densification or LpL^p covering strategies, which are computationally prohibitive.
  5. Statistical Implications and Batch Learning: By introducing a variant of the online-to-batch conversion technique, the authors connect their findings to classical statistical risk bounds. This bridges the theoretical improvements achieved in online learning to practical application within batch settings.

Implications and Future Directions

The implications of this research are substantial for fields relying on sequential decision-making under uncertainty, such as finance, healthcare, and autonomous systems. By mitigating the dependency on loss constraint assumptions, the models can be more robust and widely applicable.

Theoretically, this work also stimulates further inquiries into the potential of less-constrained models in other prediction scenarios, possibly extending beyond linear problems to broader, non-parametric settings. As such, future work could explore the universality of transductive priors across various other loss types and their impact on regret minimization strategies.

In summary, this paper pioneers a path toward harmonizing the robustness of machine learning models under unbounded losses by leveraging transductive priors. This not only pushes the frontier of theoretical machine learning but also opens doors to pragmatic solution strategies involving extensive decision datasets.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 2 posts and received 19 likes.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube