Replicable Learning of Large-Margin Halfspaces (2402.13857v2)
Abstract: We provide efficient replicable algorithms for the problem of learning large-margin halfspaces. Our results improve upon the algorithms provided by Impagliazzo, Lei, Pitassi, and Sorrell [STOC, 2022]. We design the first dimension-independent replicable algorithms for this task which runs in polynomial time, is proper, and has strictly improved sample complexity compared to the one achieved by Impagliazzo et al. [2022] with respect to all the relevant parameters. Moreover, our first algorithm has sample complexity that is optimal with respect to the accuracy parameter $\epsilon$. We also design an SGD-based replicable algorithm that, in some parameters' regimes, achieves better sample and time complexity than our first algorithm. Departing from the requirement of polynomial time algorithms, using the DP-to-Replicability reduction of Bun, Gaboardi, Hopkins, Impagliazzo, Lei, Pitassi, Sorrell, and Sivakumar [STOC, 2023], we show how to obtain a replicable algorithm for large-margin halfspaces with improved sample complexity with respect to the margin parameter $\tau$, but running time doubly exponential in $1/\tau2$ and worse sample complexity dependence on $\epsilon$ than one of our previous algorithms. We then design an improved algorithm with better sample complexity than all three of our previous algorithms and running time exponential in $1/\tau{2}$.
- Dimitris Achlioptas. Database-friendly random projections. In Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 274–281, 2001.
- Optimal compression of approximate inner products and dimension reduction. In 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), pages 639–650. IEEE, 2017.
- Monya Baker. 1,500 scientists lift the lid on reproducibility. Nature, 533(7604), 2016.
- Philip Ball. Is ai leading to a reproducibility crisis in science? Nature, 624(7990):22–25, 2023.
- Differentially private learning with margin guarantees. Advances in Neural Information Processing Systems, 35:32127–32141, 2022a.
- Open problem: Better differentially private learning algorithms with margin guarantees. In Conference on Learning Theory, pages 5638–5643. PMLR, 2022b.
- Private center points and learning of halfspaces. In Conference on Learning Theory, pages 269–282. PMLR, 2019.
- Noise-tolerant learning, the parity problem, and the statistical query model. Journal of the ACM (JACM), 50(4):506–519, 2003.
- Practical privacy: the sulq framework. In Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 128–138, 2005.
- Sébastien Bubeck et al. Convex optimization: Algorithms and complexity. Foundations and Trends® in Machine Learning, 8(3-4):231–357, 2015.
- Efficient, noise-tolerant, and private learning via boosting. In Conference on Learning Theory, pages 1031–1077. PMLR, 2020.
- Stability is stable: Connections between replicability, privacy, and adaptive generalization. arXiv preprint arXiv:2303.12921, 2023.
- Local borsuk-ulam, stability, and replicability. arXiv preprint arXiv:2311.01599, 2023a.
- Replicability and stability in learning. arXiv preprint arXiv:2304.03757, 2023b.
- Support-vector networks. Machine learning, 20:273–297, 1995.
- Nearly tight bounds for robust proper learning of halfspaces with a margin. Advances in Neural Information Processing Systems, 32, 2019.
- Information-computation tradeoffs for learning margin halfspaces with random classification noise. In The Thirty Sixth Annual Conference on Learning Theory, pages 2211–2239. PMLR, 2023.
- List and certificate complexities in replicable learning. arXiv preprint arXiv:2304.02240, 2023.
- Replicable reinforcement learning. arXiv preprint arXiv:2305.15284, 2023.
- Replicable bandits. In The Eleventh International Conference on Learning Representations, 2023a.
- Replicable clustering. arXiv preprint arXiv:2302.10359, 2023b.
- Efficient algorithms for learning from coarse labels. In Conference on Learning Theory, pages 2060–2079. PMLR, 2021.
- Casper Benjamin Freksen. An introduction to johnson-lindenstrauss transforms. arXiv preprint arXiv:2103.00564, 2021.
- A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences, 55(1):119–139, 1997.
- Large margin classification using the perceptron algorithm. In Proceedings of the eleventh annual conference on Computational learning theory, pages 209–217, 1998.
- User-level private learning via correlated sampling. arXiv preprint arXiv:2110.11208, 2021.
- Statistical-query lower bounds via functional gradients. Advances in Neural Information Processing Systems, 33:2147–2158, 2020.
- Near-tight margin-based generalization bounds for support vector machines. In International Conference on Machine Learning, pages 3779–3788. PMLR, 2020.
- Privately releasing conjunctions and the statistical query barrier. In Proceedings of the forty-third annual ACM symposium on Theory of computing, pages 803–812, 2011.
- Reproducibility in learning. arXiv preprint arXiv:2201.08430, 2022.
- Approximate nearest neighbors: towards removing the curse of dimensionality. In Proceedings of the thirtieth annual ACM symposium on Theory of computing, pages 604–613, 1998.
- William B Johnson. Extensions of lipshitz mapping into hilbert space. In Conference modern analysis and probability, 1984, pages 189–206, 1984.
- Statistical indistinguishability of learning algorithms. arXiv preprint arXiv:2305.14311, 2023.
- Private learning of halfspaces: Simplifying the construction and reducing the sample complexity. Advances in Neural Information Processing Systems, 33:13976–13985, 2020.
- Replicability in reinforcement learning. arXiv preprint arXiv:2305.19562, 2023.
- Spherical cubes: optimal foams from computational hardness amplification. Communications of the ACM, 55(10):90–97, 2012.
- Sub-sampled cubic regularization for non-convex optimization. In Doina Precup and Yee Whye Teh, editors, Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 1895–1904. PMLR, 06–11 Aug 2017. URL https://proceedings.mlr.press/v70/kohler17a.html.
- Efficient private algorithms for learning large-margin halfspaces. In Algorithmic Learning Theory, pages 704–724. PMLR, 2020.
- The unstable formula theorem revisited. arXiv preprint arXiv:2212.05050, 2022.
- The bayesian stability zoo. arXiv preprint arXiv:2310.18428, 2023.
- Iclr reproducibility challenge 2019. ReScience C, 5(2):5, 2019.
- Frank Rosenblatt. The perceptron: a probabilistic model for information storage and organization in the brain. Psychological review, 65(6):386, 1958.
- Leslie G Valiant. A theory of the learnable. Communications of the ACM, 27(11):1134–1142, 1984.
- Vladimir Vapnik. The nature of statistical learning theory. Springer science & business media, 1999.
- Vladimir Vapnik. Estimation of dependences based on empirical data. Springer Science & Business Media, 2006.