Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
133 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Least Squares Inference for Data with Network Dependency (2404.01977v1)

Published 2 Apr 2024 in stat.ME, math.ST, and stat.TH

Abstract: We address the inference problem concerning regression coefficients in a classical linear regression model using least squares estimates. The analysis is conducted under circumstances where network dependency exists across units in the sample. Neglecting the dependency among observations may lead to biased estimation of the asymptotic variance and often inflates the Type I error in coefficient inference. In this paper, we first establish a central limit theorem for the ordinary least squares estimate, with a verifiable dependence condition alongside corresponding neighborhood growth conditions. Subsequently, we propose a consistent estimator for the asymptotic variance of the estimated coefficients, which employs a data-driven method to balance the bias-variance trade-off. We find that the optimal tuning depends on the linear hypothesis under consideration and must be chosen adaptively. The presented theory and methods are illustrated and supported by numerical experiments and a data example.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (24)
  1. On normal approximations of distributions in terms of dependency graphs. The Annals of Probability, (pp. 1646–1650).
  2. Bonney, G. E. (1987). Logistic regression for dependent binary observations. Biometrics, (pp. 951–973).
  3. Bradley, R. C. (2005). Basic properties of strong mixing conditions. a survey and some open questions. Probability Surveys, 2, 107–144.
  4. Bravo, F. (2022). Misspecified semiparametric model selection with weakly dependent observations. Journal of Time Series Analysis, 43(4), 558–586.
  5. Central limit theorems for high dimensional dependent data. Bernoulli, 30(1), 712–742.
  6. Normal approximation under local dependence. The Annals of Probability, 32(3), 1985–2028.
  7. Regression from dependent observations. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, (pp. 881–889).
  8. Spatial Statistics and Modeling, vol. 90. Springer.
  9. A bound on tail probabilities for quadratic forms in independent random variables. The Annals of Mathematical Statistics, 42(3), 1079–1083.
  10. Bandwidth selection for kernel density estimation: a review of fully automatic selectors. AStA Advances in Statistical Analysis, 97, 403–433.
  11. Huber, P. J. (1967). Under nonstandard conditions. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability: Weather modification, vol. 5, (p. 221). Univ of California Press.
  12. Limit theorems for network dependent random variables. Journal of Econometrics, 222(2), 882–908.
  13. Gaussian approximation and spatially dependent wild bootstrap for high-dimensional spatial data. Journal of the American Statistical Association, (pp. 1–13).
  14. Network dependence can lead to spurious associations and invalid inference. Journal of the American Statistical Association, 116(535), 1060–1074.
  15. Longitudinal data analysis using generalized linear models. Biometrika, 73(1), 13–22.
  16. Moran, P. A. (1950). Notes on continuous stochastic phenomena. Biometrika, 37(1/2), 17–23.
  17. Causal inference for social network data with contagion. ArXiv e-prints.
  18. idtracker: tracking individuals in a group by automatic identification of unmarked animals. Nature methods, 11(7), 743–748.
  19. Hanson-wright inequality and sub-gaussian concentration. Electronic Communications In Probability, 18, 1–9.
  20. Distributional preferences in adolescent peer networks. Tech. rep., IHS Working Paper.
  21. Shashkin, A. (2010). A berry–esseen type estimate for dependent systems on transitive graphs. In Advances in Data Analysis, (pp. 151–156). Springer.
  22. Weber, M. (2020). Neighborhood growth determines geometric priors for relational representation learning. In International Conference on Artificial Intelligence and Statistics, (pp. 266–276). PMLR.
  23. White, H. (1980). A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica: journal of the Econometric Society, (pp. 817–838).
  24. A survey of tuning parameter selection for high-dimensional regression. Annual review of statistics and its application, 7, 209–226.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com