High-Dimensional Tail Index Regression: with An Application to Text Analyses of Viral Posts in Social Media (2403.01318v2)
Abstract: Motivated by the empirical observation of power-law distributions in the credits (e.g., "likes") of viral social media posts, we introduce a high-dimensional tail index regression model and propose methods for estimation and inference of its parameters. First, we present a regularized estimator, establish its consistency, and derive its convergence rate. Second, we introduce a debiasing technique for the regularized estimator to facilitate inference and prove its asymptotic normality. Third, we extend our approach to handle large-scale online streaming data using stochastic gradient descent. Simulation studies corroborate our theoretical findings. We apply these methods to the text analysis of viral posts on X (formerly Twitter) related to LGBTQ+ topics.
- Belloni, A., V. Chernozhukov, D. Chetverikov, and Y. Wei (2018): “Uniformly valid post-regularization confidence regions for many functional parameters in z-estimation framework,” Annals of statistics, 46, 3643.
- Cai, T. T., Z. Guo, and R. Ma (2023): “Statistical inference for high-dimensional generalized linear models with binary outcomes,” Journal of the American Statistical Association, 118, 1319–1332.
- Chernozhukov, V., D. Chetverikov, M. Demirer, E. Duflo, C. Hansen, W. Newey, and J. Robins (2018): “Double/debiased machine learning for treatment and structural parameters,” Econometrics Journal, 21, 1–68.
- Daouia, A., L. Gardes, and S. Girard (2013): “On kernel smoothing for extremal quantile regression,” Bernoulli, 19, 2557–2589.
- Daouia, A., L. Gardes, S. Girard, and A. Lekina (2010): “Kernel estimators of extreme level curves,” Test, 20, 311–333.
- Drees, H. (1998a): “A general class of estimators of the extreme value index,” Journal of Statistical Planning and Inference, 66, 95–112.
- Efromovich, S. (2010): “Dimension reduction and adaptation in conditional density estimation,” Journal of the American Statistical Association, 105, 761–774.
- Gardes, L. and S. Girard (2010): “Conditional extremes from heavy-tailed distributions: An application to the estimation of extreme rain fall return levels,” Extremes, 13, 177–204.
- Gardes, L., A. Guillou, and A. Schorgen (2012): “Estimating the conditional tail index by integrating a kernel conditional quantile estimator,” Journal of Statistical Planning and Inference, 142, 1586–1598.
- Izbicki, R. and A. B. Lee (2016): “Nonparametric conditional density estimation in a high-dimensional regression setting,” Journal of Computational and Graphical Statistics, 25.
- ——— (2017): “Converting high-dimensional regression to high-dimensional conditional density estimation,” Electronic Journal of Statistics, 11, 2800–2831.
- Javanmard, A. and A. Montanari (2014): “Confidence intervals and hypothesis testing for high-dimensional regression,” Journal of Machine Learning Research, 15, 2869–2909.
- Li, R., C. Leng, and J. You (2020): “Semiparametric Tail Index Regression,” Journal of Business & Economic Statistics, 40, 82–95.
- Negahban, S., B. Yu, M. J. Wainwright, and P. Ravikumar (2009): “A unified framework for high-dimensional analysis of m𝑚mitalic_m-estimators with decomposable regularizers,” Advances in neural information processing systems, 22.
- Nicolau, J., P. M. Rodrigues, and M. Z. Stoykov (2023): “Tail index estimation in the presence of covariates: Stock returns’ tail risk dynamics,” Journal of Econometrics, 235, 2266–2284.
- Taddy, M. (2013): “Multinomial inverse regression for text analysis,” Journal of the American Statistical Association, 108, 755–770.
- van de Geer, S., P. Bühlmann, Y. Ritov, and R. Dezeure (2014): “On asymptotically optimal confidence regions and tests for high-dimensional models,” Annals of Statistics, 42, 1166 – 1202.
- van de Geer, S. A. (2008): “High-dimensional generalized linear models and the lasso,” Annals of Statistics, 36, 614.
- Wang, H. and C.-L. Tsai (2009): “Tail index regression,” Journal of the American Statistical Association, 104, 1233–1240.
- Wang, H. J. and D. Li (2013): “Estimation of Extreme Conditional Quantiles Through Power Transformation,” Journal of the American Statistical Association, 108, 1062–1074.
- Zhang, C.-H. and S. S. Zhang (2014): “Confidence intervals for low dimensional parameters in high dimensional linear models,” Journal of the Royal Statistical Society: Series B: Statistical Methodology, 217–242.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.