Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Distributed Semi-Supervised Sparse Statistical Inference (2306.10395v2)

Published 17 Jun 2023 in stat.ML and cs.LG

Abstract: The debiased estimator is a crucial tool in statistical inference for high-dimensional model parameters. However, constructing such an estimator involves estimating the high-dimensional inverse Hessian matrix, incurring significant computational costs. This challenge becomes particularly acute in distributed setups, where traditional methods necessitate computing a debiased estimator on every machine. This becomes unwieldy, especially with a large number of machines. In this paper, we delve into semi-supervised sparse statistical inference in a distributed setup. An efficient multi-round distributed debiased estimator, which integrates both labeled and unlabelled data, is developed. We will show that the additional unlabeled data helps to improve the statistical rate of each round of iteration. Our approach offers tailored debiasing methods for $M$-estimation and generalized linear models according to the specific form of the loss function. Our method also applies to a non-smooth loss like absolute deviation loss. Furthermore, our algorithm is computationally efficient since it requires only one estimation of a high-dimensional inverse covariance matrix. We demonstrate the effectiveness of our method by presenting simulation studies and real data applications that highlight the benefits of incorporating unlabeled data.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (60)
  1. Arjevani, Y. and Shamir, O. (2015), “Communication Complexity of Distributed Convex Learning and Optimization,” in Advances in Neural Information Processing Systems, Curran Associates, Inc., vol. 28, pp. 1756–1764.
  2. Azriel, D., Brown, L. D., Sklar, M., Berk, R., Buja, A., and Zhao, L. (2022), “Semi-Supervised Linear Regression,” J. Amer. Statist. Assoc., 117, 2238–2251.
  3. Battey, H., Fan, J., Liu, H., Lu, J., and Zhu, Z. (2018), “Distributed testing and estimation under sparse high dimensional models,” Ann. Statist., 46, 1352–1382.
  4. Bickel, P. J. and Levina, E. (2008), “Covariance regularization by thresholding,” Ann. Statist., 36, 2577 – 2604.
  5. Blumensath, T. and Davies, M. E. (2009), “Iterative hard thresholding for compressed sensing,” Appl. Comput. Harmon. Anal., 27, 265–274.
  6. Boyd, S., Parikh, N., Chu, E., Peleato, B., and Eckstein, J. (2011), “Distributed optimization and statistical learning via the alternating direction method of multipliers,” Found.Trends Mach. Learn., 3, 1–122.
  7. Cai, T., Cai, T. T., and Guo, Z. (2021), “Optimal statistical inference for individualized treatment effects in high-dimensional models,” J. R. Stat. Soc. Ser. B. Stat. Methodol., 83, 669–719.
  8. Cai, T., Liu, W., and Luo, X. (2011), “A constrained ℓ1subscriptℓ1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT minimization approach to sparse precision matrix Estimation,” J. Amer. Statist. Assoc., 106, 594–607.
  9. Cai, T. T. and Guo, Z. (2020), “Semisupervised inference for explained variance in high dimensional linear regression and its applications,” J. R. Stat. Soc. Ser. B. Stat. Methodol., 82, 391–419.
  10. Chakrabortty, A. and Cai, T. (2018), “Efficient and adaptive linear regression in semi-supervised settings,” Ann. Statist., 46, 1541–1572.
  11. Chang, X., Lin, S.-B., and Zhou, D.-X. (2017), “Distributed semi-supervised learning with kernel ridge regression,” J. Mach. Learn. Res., 18, 1493–1514.
  12. Chen, X., Liu, W., and Zhang, Y. (2019), “Quantile regression under memory constraint,” Ann. Statist., 47, 3244 – 3273.
  13. Deng, S., Ning, Y., Zhao, J., and Zhang, H. (2020), “Optimal semi-supervised estimation and inference for high-dimensional linear regression,” arXiv e-prints, arXiv:2011.14185.
  14. Fan, J., Guo, Y., and Wang, K. (2023), “Communication-Efficient Accurate Statistical Estimation,” J. Amer. Statist. Assoc., 118, 1000–1010.
  15. Fan, J., Liu, H., Sun, Q., and Zhang, T. (2018), “I-LAMM for sparse learning: Simultaneous control of algorithmic complexity and statistical error,” Ann. Statist., 46, 814–841.
  16. Gronsbell, J. L. and Cai, T. (2018), “Semi-supervised approaches to efficient evaluation of model prediction performance,” J. R. Stat. Soc. Ser. B. Stat. Methodol., 80, 579–594.
  17. Guo, Z., Rakshit, P., Herman, D. S., and Chen, J. (2021), “Inference for the Case Probability in High-dimensional Logistic Regression,” J. Mach. Learn. Res., 22, 1–54.
  18. Guo, Z.-C., Lin, S.-B., and Shi, L. (2019), “Distributed learning with multi-penalty regularization,” Appl. Comput. Harmon. Anal., 46, 478–499.
  19. Guo, Z.-C., Shi, L., and Wu, Q. (2017), “Learning theory of distributed regression with bias corrected regularization Kernel Network,” J. Mach. Learn. Res., 18, 1–25.
  20. Hu, T. and Zhou, D.-X. (2021), “Distributed regularized least squares with flexible Gaussian kernels,” Appl. Comput. Harmon. Anal., 53, 349–377.
  21. Huang, C. and Huo, X. (2019), “A distributed one-step estimator,” Mathematical Programming, 174, 41–76.
  22. Jain, P., Tewari, A., and Kar, P. (2014), “On iterative hard thresholding methods for high-dimensional M-estimation,” in Advances in Neural Information Processing Systems, Curran Associates, Inc., vol. 27, pp. 685–693.
  23. Javanmard, A. and Montanari, A. (2014), “Confidence Intervals and Hypothesis Testing for High-Dimensional Regression,” J. Mach. Learn. Res., 15, 2869–2909.
  24. Jordan, M. I., Lee, J. D., and Yang, Y. (2019), “Communication-efficient distributed statistical inference,” J. Amer. Statist. Assoc., 114, 668–681.
  25. Kuchibhotla, A. K. and Chakrabortty, A. (2022), “Moving beyond sub-Gaussianity in high-dimensional statistics: applications in covariance estimation and linear regression,” Inf. Inference, 11, 1389–1456.
  26. Lee, J. D., Liu, Q., Sun, Y., and Taylor, J. E. (2017), “Communication-efficient sparse regression,” J. Mach. Learn. Res., 18, 115–144.
  27. Li, R., Lin, D. K., and Li, B. (2013), “Statistical inference in massive data sets,” Appl. Stoch. Models Bus. Ind., 29, 399–409.
  28. Lian, H. and Fan, Z. (2018), “Divide-and-conquer for debiased l1subscript𝑙1l_{1}italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT-norm support vector machine in ultra-high dimensions,” J. Mach. Learn. Res., 18, 1–26.
  29. Lin, S.-B., Guo, X., and Zhou, D.-X. (2017), “Distributed learning with regularized least squares,” J. Mach. Learn. Res., 18, 1–31.
  30. Lin, S.-B. and Zhou, D.-X. (2018), “Distributed kernel-based gradient descent algorithms,” Constr. Approx., 47, 249–276.
  31. Liu, M., Xia, Y., Cho, K., and Cai, T. (2021), “Integrative high dimensional multiple testing with heterogeneity under data sharing constraints,” J. Mach. Learn. Res., 22, 1–26.
  32. Liu, W. and Luo, X. (2015), “Fast and adaptive sparse precision matrix estimation in high dimensions,” J. Multivariate Anal., 135, 153–162.
  33. Ma, R., Cai, T. T., and Li, H. (2021), “Global and Simultaneous Hypothesis Testing for High-Dimensional Logistic Regression Models,” J. Amer. Statist. Assoc., 116, 984–998.
  34. Ma, R., Guo, Z., Cai, T. T., and Li, H. (2022), “Statistical Inference for Genetic Relatedness Based on High-Dimensional Logistic Regression,” arXiv e-prints, arXiv:2202.10007.
  35. Meinshausen, N. and Bühlmann, P. (2006), “High-dimensional graphs and variable selection with the Lasso,” Ann. Statist., 34, 1436 – 1462.
  36. Raskutti, G., Wainwright, M. J., and Yu, B. (2010), “Restricted Eigenvalue Properties for Correlated Gaussian Designs,” J. Mach. Learn. Res., 11, 2241–2259.
  37. — (2011), “Minimax Rates of Estimation for High-Dimensional Linear Regression Over ℓqsubscriptℓ𝑞\ell_{q}roman_ℓ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT -Balls,” IEEE Trans. Inform. Theory, 57, 6976–6994.
  38. Shamir, O., Srebro, N., and Zhang, T. (2014), “Communication-efficient distributed optimization using an approximate Newton-type method,” in Proceedings of the 31st International Conference on Machine Learning, vol. 32, pp. 1000–1008.
  39. Smith, V., Forte, S., Ma, C., Takáč, M., Jordan, M. I., and Jaggi, M. (2017), “CoCoA: A general framework for communication-efficient distributed optimization,” J. Mach. Learn. Res., 18, 8590–8638.
  40. Stich, S. U. (2019), “Local SGD converges fast and communicates little,” in 7th International Conference on Learning Representations, ICLR 2019, May 6-9, 2019.
  41. T. Tony Cai, Z. G. and Ma, R. (2023), “Statistical Inference for High-Dimensional Generalized Linear Models With Binary Outcomes,” J. Amer. Statist. Assoc., 118, 1319–1332.
  42. Tibshirani, R. (1996), “Regression shrinkage and selection via the lasso,” J. R. Stat. Soc. Ser. B. Stat. Methodol., 58, 267–288.
  43. van de Geer, S., Bühlmann, P., Ritov, Y., and Dezeure, R. (2014), “On asymptotically optimal confidence regions and tests for high-dimensional models,” Ann. Statist., 42, 1166 – 1202.
  44. Vershynin, R. (2010), “Introduction to the non-asymptotic analysis of random matrices,” arXiv e-prints, arXiv:1011.3027.
  45. Volgushev, S., Chao, S.-K., and Cheng, G. (2019), “Distributed inference for quantile regression processes,” Ann. Statist., 47, 1634 – 1662.
  46. Wang, J., Kolar, M., Srebro, N., and Zhang, T. (2017), “Efficient Distributed Learning with Sparsity,” in Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 3636–3645.
  47. Wu, Z., Wang, C., and Liu, W. (2023), “A unified precision matrix estimation framework via sparse column-wise inverse operator under weak sparsity,” Ann. Inst. Statist. Math., 75, 619–648.
  48. Yuan, M. (2010), “High dimensional inverse covariance matrix estimation via linear programming,” J. Mach. Learn. Res., 11, 2261–2286.
  49. Yuan, X.-T., Li, P., and Zhang, T. (2018), “Gradient hard thresholding pursuit,” J. Mach. Learn. Res., 18, 1–43.
  50. Zhang, A., Brown, L. D., and Cai, T. T. (2019), “Semi-supervised inference: General theory and estimation of means,” Ann. Statist., 47, 2538 – 2566.
  51. Zhang, C.-H. (2010a), “Nearly unbiased variable selection under minimax concave penalty,” Ann. Statist., 38, 894–942.
  52. Zhang, C.-H. and Zhang, S. S. (2014), “Confidence intervals for low dimensional parameters in high dimensional linear models,” J. R. Stat. Soc. Ser. B. Stat. Methodol., 76, 217–242.
  53. Zhang, T. (2010b), “Analysis of multi-stage convex relaxation for sparse regularization,” J. Mach. Learn. Res., 11, 1081–1107.
  54. Zhang, Y. and Bradic, J. (2021), “High-dimensional semi-supervised learning: in search of optimal inference of the mean,” Biometrika.
  55. Zhang, Y., Duchi, J., Jordan, M. I., and Wainwright, M. J. (2013), “Information-theoretic lower bounds for distributed statistical estimation with communication constraints,” in Advances in Neural Information Processing Systems, Curran Associates, Inc., vol. 26, pp. 2328–2336.
  56. Zhang, Y., Duchi, J., and Wainwright, M. (2015), “Divide and conquer kernel ridge regression: A distributed algorithm with minimax optimal rates,” J. Mach. Learn. Res., 16, 3299–3340.
  57. Zhang, Y. and Lin, X. (2015), “DiSCO: Distributed optimization for self-concordant empirical loss,” in Proceedings of the 32nd International Conference on Machine Learning, PMLR, vol. 37 of Proceedings of Machine Learning Research, pp. 362–370.
  58. Zhao, T., Kolar, M., and Liu, H. (2014), “A General Framework for Robust Testing and Confidence Regions in High-Dimensional Quantile Regression,” arXiv e-prints, arXiv:1412.8724.
  59. Zhao, W., Zhang, F., and Lian, H. (2020), “Debiasing and distributed estimation for high-dimensional quantile regression,” IEEE Trans. Neural Netw. Learn. Syst., 31, 2569–2577.
  60. Zhou, D.-X. (2020), “Distributed Kernel Ridge Regression with Communications,” J. Mach. Learn. Res., 21.

Summary

We haven't generated a summary for this paper yet.