2000 character limit reached
Insufficient Statistics Perturbation: Stable Estimators for Private Least Squares (2404.15409v1)
Published 23 Apr 2024 in cs.LG, cs.CR, and stat.ML
Abstract: We present a sample- and time-efficient differentially private algorithm for ordinary least squares, with error that depends linearly on the dimension and is independent of the condition number of $X\top X$, where $X$ is the design matrix. All prior private algorithms for this task require either $d{3/2}$ examples, error growing polynomially with the condition number, or exponential time. Our near-optimal accuracy guarantee holds for any dataset with bounded statistical leverage and bounded residuals. Technically, we build on the approach of Brown et al. (2023) for private mean estimation, adding scaled noise to a carefully designed stable nonprivate estimator of the empirical regression vector.
- On the sample complexity of privately learning unbounded high-dimensional gaussians. In Algorithmic Learning Theory, pages 185–216. Proceedings of Machine Learning Research, 2021.
- Differentially private simple linear regression. Proceedings on Privacy Enhancing Technologies, 2022.
- Privately estimating a gaussian: Efficient, robust, and optimal. In Proceedings of the 55th Annual ACM Symposium on Theory of Computing, pages 483–496, 2023.
- Francis J Anscombe. Graphs in statistical analysis. The american statistician, 27(1):17–21, 1973.
- Private and polynomial time algorithms for learning gaussians and beyond. In Conference on Learning Theory, pages 1075–1076. Proceedings of Machine Learning Research, 2022.
- Trimmed maximum likelihood estimation for robust generalized linear model. In Advances in Neural Information Processing Systems, volume 35, pages 862–873. Curran Associates, Inc., 2022.
- Robust linear regression: Optimal rates in polynomial time. In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, pages 102–115, 2021.
- Private empirical risk minimization: Efficient algorithms and tight error bounds. In 2014 IEEE 55th Annual Symposium on Foundations of Computer Science, pages 464–473. IEEE, 2014.
- Regression diagnostics: Identifying influential data and sources of collinearity. John Wiley & Sons, 2005.
- Robust regression via hard thresholding. Advances in neural information processing systems, 28, 2015.
- Consistent robust regression. Advances in Neural Information Processing Systems, 30, 2017.
- Coinpress: Practical private mean and covariance estimation. arXiv preprint arXiv:2006.06618, 2020.
- Covariance-aware private mean estimation without private covariance estimation. Advances in Neural Information Processing Systems, 34:7950–7964, 2021.
- Fast, sample-efficient, affine-invariant private mean and covariance estimation for subgaussian distributions. In The Thirty Sixth Annual Conference on Learning Theory, pages 5578–5579. Proceedings of Machine Learning Research, 2023.
- Private gradient descent for linear regression: Tighter error bounds and instance-specific uncertainty estimation. arXiv preprint arXiv:2402.13531, 2024.
- Average-case averages: Private algorithms for smooth sensitivity and mean estimation. arXiv preprint arXiv:1906.02830, 2019.
- Private hypothesis selection. In Advances in Neural Information Processing Systems, pages 156–167, 2019.
- The cost of privacy: Optimal rates of convergence for parameter estimation with differential privacy. The Annals of Statistics, 49(5):2825–2850, 2021.
- Score attack: A lower bound technique for optimal differentially private learning. arXiv preprint arXiv:2303.07152, 2023.
- Online and distribution-free robustness: Regression and contextual bandits with huber contamination. In 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS), pages 684–695. IEEE, 2022.
- Robust and private bayesian inference. In International Conference on Algorithmic Learning Theory, pages 291–305. Springer, 2014.
- Differential privacy and robust statistics. In Proceedings of the 41st ACM Symposium on Theory of Computing, STOC ’09, pages 371–380. ACM, 2009.
- Calibrating noise to sensitivity in private data analysis. In Shai Halevi and Tal Rabin, editors, Theory of Cryptography, Third Theory of Cryptography Conference, TCC 2006, New York, NY, USA, March 4-7, 2006, Proceedings, volume 3876 of Lecture Notes in Computer Science, pages 265–284. Springer, 2006. 10.1007/11681878_14. URL https://doi.org/10.1007/11681878_14.
- Analyze gauss: optimal bounds for privacy-preserving principal component analysis. In Proceedings of the forty-sixth annual ACM symposium on Theory of computing, pages 11–20, 2014.
- On the theory and practice of privacy-preserving bayesian data analysis. arXiv preprint arXiv:1603.07294, 2016.
- The hat matrix in regression and anova. The American Statistician, 32(1):17–22, 1978.
- Efficient mean estimation with pure differential privacy via a sum-of-squares exponential mechanism. In Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing, pages 1406–1417, 2022.
- An analysis of random design linear regression. arXiv preprint arXiv:1106.2363, 6, 2011.
- Peter J Huber. Robust statistics. In International encyclopedia of statistical science, pages 1248–1251. Springer, 2011.
- Robust sub-gaussian principal component analysis and width-independent schatten packing. Advances in Neural Information Processing Systems, 33:15689–15701, 2020.
- Privately learning high-dimensional distributions. In Conference on Learning Theory, pages 1853–1902. Proceedings of Machine Learning Research, 2019.
- A private and computationally-efficient estimator for unbounded gaussians. In Conference on Learning Theory, pages 544–572. Proceedings of Machine Learning Research, 2022.
- Finite sample differentially private confidence intervals. arXiv preprint arXiv:1711.03908, 2017.
- Finite sample differentially private confidence intervals. In 9th Innovations in Theoretical Computer Science Conference (ITCS 2018), volume 94, page 44. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, 2018.
- Private convex empirical risk minimization and high-dimensional regression. In Conference on Learning Theory, pages 25–1. JMLR Workshop and Conference Proceedings, 2012.
- Efficient algorithms for outlier-robust regression. In Conference On Learning Theory, pages 1420–1430. PMLR, 2018.
- A pretty fast algorithm for adaptive private mean estimation. In Conference on Learning Theory, pages 2511–2551. Proceedings of Machine Learning Research, 2023.
- Robust and differentially private mean estimation. Advances in Neural Information Processing Systems, 34:3887–3901, 2021.
- Differential privacy and robust statistics in high dimensions. In Conference on Learning Theory, pages 1167–1246. Proceedings of Machine Learning Research, 2022.
- Label robust and differentially private linear regression: Computational and statistical efficiency. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
- A second course in statistics: regression analysis, volume 6. Prentice Hall Upper Saddle River, NJ, 2003.
- Differential privacy without sensitivity. In Advances in Neural Information Processing Systems, pages 956–964, 2016.
- Darakhshan J Mir. Differential privacy: an exploration of the privacy-utility landscape. Rutgers The State University of New Jersey-New Brunswick, 2013.
- Shyam Narayanan. Better and simpler lower bounds for differentially private statistical estimation. arXiv preprint arXiv:2310.06289, 2023.
- Robust regression with covariate filtering: Heavy tails and adversarial contamination. arXiv preprint arXiv:2009.12976, 2020.
- Or Sheffet. Differentially private ordinary least squares. In International Conference on Machine Learning, pages 3105–3114. PMLR, 2017.
- Or Sheffet. Old techniques in differentially private linear regression. In Algorithmic Learning Theory, pages 789–827. PMLR, 2019.
- Iterative least trimmed squares for mixed linear regression. Advances in Neural Information Processing Systems, 32, 2019a.
- Learning with bad training data via iterative trimmed loss minimization. In International Conference on Machine Learning, pages 5739–5748. PMLR, 2019b.
- Adaptive hard thresholding for near-optimal consistent robust regression. In Conference on Learning Theory, pages 2892–2897. PMLR, 2019.
- Friendlycore: Practical differentially private aggregation. In International Conference on Machine Learning, pages 21828–21863. Proceedings of Machine Learning Research, 2022.
- Salil Vadhan. The complexity of differential privacy. In Tutorials on the Foundations of Cryptography, pages 347–450. Springer, 2017.
- (nearly) optimal private linear regression for sub-gaussian data via adaptive clipping. In Proceedings of Thirty Fifth Conference on Learning Theory, volume 178 of Proceedings of Machine Learning Research, pages 1126–1166. PMLR, 02–05 Jul 2022.
- Efficient computing of regression diagnostics. The American Statistician, 35(4):234–242, 1981.
- Roman Vershynin. High-dimensional probability: An introduction with applications in data science, volume 47. Cambridge university press, 2018.
- Differential privacy for clinical trial data: Preliminary evaluations. In 2009 IEEE International Conference on Data Mining Workshops, pages 138–143. IEEE, 2009.
- Yu-Xiang Wang. Revisiting differentially private linear regression: optimal and adaptive prediction & estimation in unbounded domain. arXiv preprint arXiv:1803.02596, 2018.
- Privacy for free: Posterior sampling and stochastic gradient monte carlo. In International Conference on Machine Learning, pages 2493–2502. PMLR, 2015.