Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Agnostically Learning Single-Index Models using Omnipredictors (2306.10615v1)

Published 18 Jun 2023 in cs.LG, cs.DS, and stat.ML

Abstract: We give the first result for agnostically learning Single-Index Models (SIMs) with arbitrary monotone and Lipschitz activations. All prior work either held only in the realizable setting or required the activation to be known. Moreover, we only require the marginal to have bounded second moments, whereas all prior work required stronger distributional assumptions (such as anticoncentration or boundedness). Our algorithm is based on recent work by [GHK$+$23] on omniprediction using predictors satisfying calibrated multiaccuracy. Our analysis is simple and relies on the relationship between Bregman divergences (or matching losses) and $\ell_p$ distances. We also provide new guarantees for standard algorithms like GLMtron and logistic regression in the agnostic setting.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (23)
  1. Alan Agresti. Foundations of linear and generalized linear models. John Wiley & Sons, 2015.
  2. Exponentially many local minima for single neurons. In D. Touretzky, M.C. Mozer, and M. Hasselmo, editors, Advances in Neural Information Processing Systems, volume 8. MIT Press, 1995.
  3. Approximation schemes for relu regression. In Conference on Learning Theory, pages 1452–1485. PMLR, 2020.
  4. Hardness of learning a single neuron with adversarial label noise. In Gustau Camps-Valls, Francisco J. R. Ruiz, and Isabel Valera, editors, Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, volume 151 of Proceedings of Machine Learning Research, pages 8199–8213. PMLR, 28–30 Mar 2022.
  5. The optimality of polynomial regression for agnostic learning under gaussian marginals in the sq model. In Mikhail Belkin and Samory Kpotufe, editors, Proceedings of Thirty Fourth Conference on Learning Theory, volume 134 of Proceedings of Machine Learning Research, pages 1552–1584. PMLR, 15–19 Aug 2021.
  6. Non-convex sgd learns halfspaces with adversarial label noise. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 18540–18549. Curran Associates, Inc., 2020.
  7. Learning a single neuron with adversarial label noise via gradient descent. In Po-Ling Loh and Maxim Raginsky, editors, Proceedings of Thirty Fifth Conference on Learning Theory, volume 178 of Proceedings of Machine Learning Research, pages 4313–4361. PMLR, 02–05 Jul 2022.
  8. Near-optimal sq lower bounds for agnostically learning halfspaces and relus under gaussian marginals. Advances in Neural Information Processing Systems, 33:13586–13596, 2020.
  9. Agnostic learning of a single neuron with gradient descent. Advances in Neural Information Processing Systems, 33:5417–5428, 2020.
  10. Statistical-query lower bounds via functional gradients. Advances in Neural Information Processing Systems, 33:2147–2158, 2020.
  11. Loss minimization through the lens of outcome indistinguishability. In Yael Tauman Kalai, editor, 14th Innovations in Theoretical Computer Science Conference, ITCS 2023, January 10-13, 2023, MIT, Cambridge, Massachusetts, USA, volume 251 of LIPIcs, pages 60:1–60:20. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2023.
  12. Time/accuracy tradeoffs for learning a relu with respect to gaussian marginals. Advances in neural information processing systems, 32, 2019.
  13. Reliably learning the relu in polynomial time. In Conference on Learning Theory, pages 1004–1042. PMLR, 2017.
  14. Omnipredictors. In 13th Innovations in Theoretical Computer Science Conference (ITCS 2022). Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2022.
  15. Higher order concentration for functions of weakly dependent random variables. Electronic Journal of Probability, 24:1–19, 2019.
  16. Multicalibration: Calibration for the (Computationally-identifiable) masses. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 1939–1948. PMLR, 10–15 Jul 2018.
  17. Multiaccuracy: Black-box post-processing for fairness in classification. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, pages 247–254, 2019.
  18. Efficient learning of generalized linear and single index models with isotonic regression. Advances in Neural Information Processing Systems, 24, 2011.
  19. Efficient distribution-free learning of probabilistic concepts. Journal of Computer and System Sciences, 48(3):464–497, 1994.
  20. The isotron algorithm: High-dimensional isotonic regression. In COLT 2009 - The 22nd Conference on Learning Theory, Montreal, Quebec, Canada, June 18-21, 2009, 2009.
  21. Peter McCullagh. Generalized linear models. European Journal of Operational Research, 16(3):285–292, 1984.
  22. The chow parameters problem. In Proceedings of the fortieth annual ACM symposium on Theory of computing, pages 517–526, 2008.
  23. Learning kernel-based halfspaces with the 0-1 loss. SIAM Journal on Computing, 40(6):1623–1646, 2011.
Citations (7)

Summary

We haven't generated a summary for this paper yet.