Distributed High-Dimensional Quantile Regression: Estimation Efficiency and Support Recovery (2405.07552v3)
Abstract: In this paper, we focus on distributed estimation and support recovery for high-dimensional linear quantile regression. Quantile regression is a popular alternative tool to the least squares regression for robustness against outliers and data heterogeneity. However, the non-smoothness of the check loss function poses big challenges to both computation and theory in the distributed setting. To tackle these problems, we transform the original quantile regression into the least-squares optimization. By applying a double-smoothing approach, we extend a previous Newton-type distributed approach without the restrictive independent assumption between the error term and covariates. An efficient algorithm is developed, which enjoys high computation and communication efficiency. Theoretically, the proposed distributed estimator achieves a near-oracle convergence rate and high support recovery accuracy after a constant number of iterations. Extensive experiments on synthetic examples and a real data application further demonstrate the effectiveness of the proposed method.
- One-round communication efficient distributed M-estimation. In International Conference on Artificial Intelligence and Statistics, pp. 46–54. PMLR, 2021.
- ℓ1subscriptℓ1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT-penalized quantile regression in high-dimensional sparse models. The Annals of Statistics, 39(1):82–130, 2011.
- Uniform inference for high-dimensional quantile regression: linear functionals and regression rank scores. arXiv preprint arXiv:1702.06209, 2017.
- Adaptive thresholding for sparse covariance matrix estimation. Journal of the American Statistical Association, 106(494):672–684, 2011.
- Optimal rates of convergence for covariance matrix estimation. The Annals of Statistics, 38(4):2118–2144, 2010.
- Quantile regression in big data: A divide and conquer based strategy. Computational Statistics & Data Analysis, 144:106892, 2020.
- A split-and-conquer approach for analysis of extraordinarily large data. Statistica Sinica, pp. 1655–1684, 2014.
- Quantile regression under memory constraint. The Annals of Statistics, 47(6):3244–3273, 2019.
- Distributed high-dimensional regression under a quantile loss function. Journal of Machine Learning Research, 21(1):7432–7474, 2020.
- Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American statistical Association, 96(456):1348–1360, 2001.
- Adaptive robust variable selection. The Annals of Statistics, 42(1):324–351, 2014.
- Communication-efficient accurate statistical estimation. Journal of the American Statistical Association, 116:1–11, 2021.
- A lack-of-fit test for quantile regression process models. Statistics & Probability Letters, 192:109680, 2023.
- Smoothing quantile regressions. Journal of Business & Economic Statistics, 39(1):338–357, 2021.
- A review of distributed statistical inference. Statistical Theory and Related Fields, 6(2):89–99, 2022.
- Statistical learning with sparsity: the lasso and generalizations. CRC press, 2015.
- Smoothed quantile regression with large-scale inference. Journal of Econometrics, 232(2):367–388, 2023.
- Communication-efficient modeling with penalized quantile regression for distributed data. Complexity, 2021:1–16, 01 2021. doi: 10.1155/2021/6341707.
- A distributed one-step estimator. Mathematical Programming, 174:41–76, 2019.
- Robust decoding from 1-bit compressive sampling with ordinary and regularized least squares. SIAM Journal on Scientific Computing, 40(4):A2062–A2086, 2018.
- Smoothing quantile regression for a distributed system. Neurocomputing, 466:311–326, 2021.
- Communication-efficient distributed statistical inference. Journal of the American Statistical Association, 114(526):668–681, 2019.
- Koenker, R. Quantile regression, volume 38. Cambridge University Press, 2005.
- Regression quantiles. Econometrica, 46(1):33–50, 1978.
- Communication-efficient sparse regression. Journal of Machine Learning Research, 18(1):115–144, 2017.
- Statistical inference in massive data sets. Applied Stochastic Models in Business and Industry, 29(5):399–409, 2013.
- Federated learning: Challenges, methods, and future directions. IEEE Signal Processing Magazine, 37(3):50–60, 2020.
- L1-regularized least squares for support recovery of high dimensional single index models with gaussian designs. Journal of Machine Learning Research, 17(1):2976–3012, 2016.
- Human immunodeficiency virus reverse transcriptase and protease sequence database. Nucleic acids research, 31 1:298–303, 2003.
- Schmidt, M. Graphical model structure learning with l1-regularization. University of British Columbia, 2010.
- Communication-efficient distributed optimization using an approximate newton-type method. In International Conference on Machine Learning, pp. 1000–1008. PMLR, 2014.
- An algorithm for quadratic ℓ1subscriptℓ1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT-regularized optimization with a flexible active-set strategy. Optimization Methods and Software, 30(6):1213–1237, 2015.
- Communication-constrained distributed quantile regression with optimal statistical guarantees. Journal of Machine Learning Research, 23:1–61, 2022.
- Wainwright, M. J. Sharp thresholds for high-dimensional and noisy sparsity recovery using ℓ1subscriptℓ1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT-constrained quadratic programming (lasso). IEEE Transactions on Information Theory, 55(5):2183–2202, 2009.
- Wainwright, M. J. High-dimensional statistics: A non-asymptotic viewpoint, volume 48. Cambridge University Press, 2019.
- Efficient distributed learning with sparsity. In International Conference on Machine Learning, pp. 3636–3645. PMLR, 2017.
- Wright, S. J. Coordinate descent algorithms. Mathematical Programming, 151(1):3–34, 2015.
- Block average quantile regression for massive dataset. Statistical Papers, 61(1):141–165, 2020.
- Zhang, C.-H. Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics, 38(2):894–942, 2010.
- Communication-efficient distributed optimization of self-concordant empirical loss. In Large-Scale and Distributed Optimization, pp. 289–341. Springer, 2018.
- Communication-efficient algorithms for statistical optimization. Journal of Machine Learning Research, 14(68):3321–3363, 2013.
- Divide and conquer kernel ridge regression: A distributed algorithm with minimax optimal rates. Journal of Machine Learning Research, 16(1):3299–3340, 2015.
- On model selection consistency of lasso. Journal of Machine Learning Research, 7:2541–2563, 2006.
- A general framework for robust testing and confidence regions in high-dimensional quantile regression. arXiv preprint arXiv:1412.8724, 2014.