Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

$H$-Consistency Guarantees for Regression (2403.19480v1)

Published 28 Mar 2024 in cs.LG and stat.ML

Abstract: We present a detailed study of $H$-consistency bounds for regression. We first present new theorems that generalize the tools previously given to establish $H$-consistency bounds. This generalization proves essential for analyzing $H$-consistency bounds specific to regression. Next, we prove a series of novel $H$-consistency bounds for surrogate loss functions of the squared loss, under the assumption of a symmetric distribution and a bounded hypothesis set. This includes positive results for the Huber loss, all $\ell_p$ losses, $p \geq 1$, the squared $\epsilon$-insensitive loss, as well as a negative result for the $\epsilon$-insensitive loss used in squared Support Vector Regression (SVR). We further leverage our analysis of $H$-consistency for regression and derive principled surrogate losses for adversarial regression (Section 5). This readily establishes novel algorithms for adversarial regression, for which we report favorable experimental results in Section 6.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (70)
  1. H𝐻Hitalic_H-consistency bounds for surrogate loss minimizers. In International Conference on Machine Learning, pages 1117–1174, 2022a.
  2. Multi-class H𝐻Hitalic_H-consistency bounds. In Advances in neural information processing systems, pages 782–795, 2022b.
  3. Theoretically grounded loss functions and algorithms for adversarial robustness. In International Conference on Artificial Intelligence and Statistics, pages 10077–10094, 2023.
  4. DC-programming for neural network optimizations. Journal of Global Optimization, pages 1–17, 2024.
  5. Convexity, classification, and risk bounds. Journal of the American Statistical Association, 101(473):138–156, 2006.
  6. Mathieu Blondel. Structured prediction with projection oracles. In Advances in neural information processing systems, 2019.
  7. A. Caponnetto. A note on the role of squared loss in regression. Technical report, Massachusetts Institute of Technology, 2005.
  8. Consistency and robustness of kernel-based regression in convex risk minimization. Bernoulli, 13(3), 2007.
  9. A consistent regularization approach for structured prediction. In Advances in neural information processing systems, 2016.
  10. James A Clarkson. Uniformly convex spaces. Transactions of the American Mathematical Society, 40(3):396–414, 1936.
  11. Sharp statistical guaratees for adversarially robust gaussian classification. In International Conference on Machine Learning, pages 2345–2355, 2020.
  12. Cvxpy: A python-embedded modeling language for convex optimization. The Journal of Machine Learning Research, 17(1):2909–2913, 2016.
  13. Provable tradeoffs in adversarially robust classification. IEEE Transactions on Information Theory, 2023.
  14. Least angle regression. The Annals of Statistics, 32(2):407–451, 2004.
  15. Scaleable input gradient regularization for adversarial robustness. arXiv preprint arXiv:1905.11468, 2019.
  16. An embedding framework for consistent polyhedral surrogates. In Advances in neural information processing systems, 2019.
  17. Rafael Frongillo and Bo Waggoner. Surrogate regret bounds for polyhedral losses. In Advances in Neural Information Processing Systems, pages 21569–21580, 2021.
  18. On the consistency of multi-label learning. In Conference on learning theory, pages 341–358, 2011.
  19. On the consistency of AUC pairwise optimization. In International Joint Conference on Artificial Intelligence, 2015.
  20. Regularisation of neural networks by enforcing lipschitz continuity. Machine Learning, 110:393–416, 2021.
  21. The curse of overparametrization in adversarial training: Precise analysis of robust generalization for random features regression. arXiv preprint arXiv:2201.05149, 2022.
  22. Formal guarantees on the robustness of a classifier against adversarial manipulation. In Advances in neural information processing systems, 2017.
  23. P. J. Huber. Robust estimation of a location parameter. Ann. Math. Statist, 35:73––101, 1964.
  24. Precise statistical analysis of classification accuracies for adversarial training. The Annals of Statistics, 50(4):2127–2156, 2022.
  25. Precise tradeoffs in adversarial training for linear regression. In Conference on Learning Theory, pages 2034–2078, 2020.
  26. Consistent multilabel classification. In Advances in Neural Information Processing Systems, 2015.
  27. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, pages 1097–1105, 2012.
  28. Non-asymptotic bounds for adversarial excess risk under misspecified models. arXiv preprint arXiv:2309.00771, 2023.
  29. Consistency versus realizable H-consistency for multiclass classification. In International Conference on Machine Learning, pages 801–809, 2013.
  30. On the tradeoff between robustness and fairness. In Advances in Neural Information Processing Systems, pages 26230–26241, 2022.
  31. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017.
  32. Two-stage learning to defer with multiple experts. In Advances in neural information processing systems, 2023a.
  33. H-consistency bounds: Characterization and extensions. In Advances in Neural Information Processing Systems, 2023b.
  34. H-consistency bounds for pairwise misranking loss surrogates. In International conference on Machine learning, 2023c.
  35. Ranking with abstention. In ICML 2023 Workshop The Many Facets of Preference-Based Learning, 2023d.
  36. Structured prediction with stronger consistency guarantees. In Advances in Neural Information Processing Systems, 2023e.
  37. Cross-entropy loss functions: Theoretical analysis and applications. In International Conference on Machine Learning, 2023f.
  38. Principled approaches for learning to defer with multiple experts. In International Symposium on Artificial Intelligence and Mathematics, 2024a.
  39. Predictor-rejector multi-class abstention: Theoretical analysis and algorithms. In Algorithmic Learning Theory, 2024b.
  40. Theoretically grounded loss functions and algorithms for score-based multi-class abstention. In International Conference on Artificial Intelligence and Statistics, 2024c.
  41. Bayes-optimal scorers for bipartite ranking. In Conference on Learning Theory, pages 68–106, 2014.
  42. The curious case of adversarially robust models: More data can help, double descend, or hurt generalization. In Uncertainty in Artificial Intelligence, pages 129–139, 2021.
  43. Learning to reject with a fixed predictor: Application to decontextualization. In International Conference on Learning Representations, 2024.
  44. Foundations of machine learning. MIT press, 2018.
  45. Consistent multiclass algorithms for complex performance measures. In International Conference on Machine Learning, pages 2398–2407, 2015.
  46. On structured prediction theory with calibrated convex surrogate losses. In Advances in Neural Information Processing Systems, 2017.
  47. On the consistency of ordinal regression methods. Journal of Machine Learning Research, 18:1–35, 2017.
  48. Adversarial training can hurt generalization. arXiv preprint arXiv:1906.06032, 2019.
  49. Overparameterized linear regression under adversarial attacks. IEEE Transactions on Signal Processing, 71:601–614, 2023.
  50. Regularization properties of adversarially-trained linear regression. In Advances in Neural Information Processing Systems, 2023.
  51. Limited haplotype diversity underlies polygenic trait architecture across 70 years of wheat breeding. Genome Biology, 22(1):1–30, 2021.
  52. Ingo Steinwart. How to compare different loss functions and their risks. Constructive Approximation, 26(2):225–287, 2007.
  53. Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems, pages 3104–3112, 2014.
  54. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
  55. Asymptotic behavior of adversarial training in binary linear classification. In IEEE International Symposium on Information Theory (ISIT), pages 127–132, 2022.
  56. On the consistency of multiclass classification methods. Journal of Machine Learning Research, 8(36):1007–1025, 2007.
  57. Robustness may be at odds with accuracy. arXiv preprint arXiv:1805.12152, 2018.
  58. On theoretically optimal ranking functions in bipartite ranking. Journal of the American Statistical Association, 112(519):1311–1322, 2017.
  59. Vladimir N. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag, 2000.
  60. Weston-Watkins hinge loss and ordered partitions. In Advances in neural information processing systems, pages 19873–19883, 2020.
  61. On classification-calibration of gamma-phi losses. arXiv preprint arXiv:2302.07321, 2023.
  62. Adversarially robust estimate and risk analysis in linear regression. In International Conference on Artificial Intelligence and Statistics, pages 514–522, 2021.
  63. A closer look at accuracy vs. robustness. In Advances in neural information processing systems, pages 8588–8601, 2020.
  64. Theoretically principled trade-off between robustness and accuracy. arXiv preprint arXiv:1901.08573, 2019.
  65. Bayes consistency vs. H-consistency: The interplay between surrogate loss functions and the scoring function class. In Advances in Neural Information Processing Systems, pages 16927–16936, 2020.
  66. Convex calibrated surrogates for the multi-label f-measure. In International Conference on Machine Learning, pages 11246–11255, 2020.
  67. Tong Zhang. Statistical behavior and consistency of classification methods based on convex risk minimization. The Annals of Statistics, 32(1):56–85, 2004a.
  68. Tong Zhang. Statistical analysis of some multi-category large margin classification methods. Journal of Machine Learning Research, 5(Oct):1225–1251, 2004b.
  69. Revisiting discriminative vs. generative classifiers: Theory and implications. arXiv preprint arXiv:2302.02334, 2023.
  70. A non-asymptotic moreau envelope theory for high-dimensional generalized linear models. In Advances in Neural Information Processing Systems, pages 21286–21299, 2022.
Citations (7)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets