Data-adaptive exposure thresholds for the Horvitz-Thompson estimator of the Average Treatment Effect in experiments with network interference (2405.15887v2)
Abstract: Randomized controlled trials often suffer from interference, a violation of the Stable Unit Treatment Values Assumption (SUTVA) in which a unit's treatment assignment affects the outcomes of its neighbors. This interference causes bias in naive estimators of the average treatment effect (ATE). A popular method to achieve unbiasedness is to pair the Horvitz-Thompson estimator of the ATE with a known exposure mapping: a function that identifies which units in a given randomization are not subject to interference. For example, an exposure mapping can specify that any unit with at least $h$-fraction of its neighbors having the same treatment status does not experience interference. However, this threshold $h$ is difficult to elicit from domain experts, and a misspecified threshold can induce bias. In this work, we propose a data-adaptive method to select the "$h$"-fraction threshold that minimizes the mean squared error of the Hortvitz-Thompson estimator. Our method estimates the bias and variance of the Horvitz-Thompson estimator under different thresholds using a linear dose-response model of the potential outcomes. We present simulations illustrating that our method improves upon non-adaptive choices of the threshold. We further illustrate the performance of our estimator by running experiments on a publicly-available Amazon product similarity graph. Furthermore, we demonstrate that our method is robust to deviations from the linear potential outcomes model.
- Complex Contagions and the Weakenss of Long Ties. American Journal of Sociology, 113(3):702–734, 2007.
- Diffusion of innovations in social networks. In 2011 50th IEEE conference on decision and control and European control conference, pages 2329–2334. IEEE, 2011.
- Bryony Reich. The diffusion of innovations in social networks. Working paper, University College London, 2016.
- Estimating average causal effects under general interference, with application to a social network experiment. 2017.
- Graph cluster randomization: Network exposure to multiple universes. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 329–337, 2013.
- Elements of estimation theory for causal effects in the presence of network interference. arXiv preprint arXiv:1702.03578, 2017.
- Estimating spillovers using imprecisely measured networks. arXiv preprint arXiv:1904.00136, 2019.
- The local approach to causal inference under network interference. arXiv preprint arXiv:2105.03810, 2021.
- Design and analysis of experiments in networks: Reducing bias from interference. Journal of Causal Inference, 5(1):20150021, 2017.
- Estimation of causal peer influence effects. In International conference on machine learning, pages 1489–1497. PMLR, 2013.
- Model-assisted design of experiments in the presence of network-correlated outcomes. Biometrika, 105(4):849–858, 2018.
- Social networks and the decision to insure. American Economic Journal: Applied Economics, 7(2):81–108, 2015.
- Integrating active learning in causal inference with interference: A novel approach in online experiments. arXiv preprint arXiv:2402.12710, 2024.
- Adaptive estimator selection for off-policy evaluation. In International Conference on Machine Learning, pages 9196–9205. PMLR, 2020.
- Variable bandwidth and local linear regression smoothers. The Annals of Statistics, pages 2008–2036, 1992.
- David Ruppert. Empirical-bias bandwidths for local polynomial nonparametric regression and density estimation. Journal of the American Statistical Association, 92(439):1049–1062, 1997.
- Policy evaluation and optimization with continuous treatments. In International conference on artificial intelligence and statistics, pages 1243–1251. PMLR, 2018.
- Bandwidth selection in kernel density estimation: oracle inequalities and adaptive minimax optimality. 2011.
- Neighborhood adaptive estimators for causal inference under network interference. arXiv preprint arXiv:2212.03683, 2022.
- Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal, 21(1):C1–C68, 01 2018. ISSN 1368-4221. doi: 10.1111/ectj.12097. URL https://doi.org/10.1111/ectj.12097.
- Program evaluation with high-dimensional data. Technical report, cemmap working paper, 2015.
- Edward H Kennedy. Semiparametric doubly robust targeted double machine learning: a review. arXiv preprint arXiv:2203.06469, 2022.
- Weak convergence. Springer, 1996.
- Normal approximation by Stein’s method. Springer Science & Business Media, 2010.
- Random design analysis of ridge regression. In Conference on learning theory, pages 9–1. JMLR Workshop and Conference Proceedings, 2012.
- Michael W Mahoney et al. Randomized algorithms for matrices and data. Foundations and Trends® in Machine Learning, 3(2):123–224, 2011.
- Randomized numerical linear algebra: Foundations and algorithms. Acta Numerica, 29:403–572, 2020.
- Alwyn Young. Channeling fisher: Randomization tests and the statistical insignificance of seemingly significant experimental results. The quarterly journal of economics, 134(2):557–598, 2019.
- Randomized sketches of convex programs with sharp guarantees. IEEE Transactions on Information Theory, 61(9):5096–5115, 2015.
- Testing models of social learning on networks: Evidence from two experiments. Econometrica, 88(1):1–32, 2020.
- General covariance-based conditions for central limit theorems with dependent triangular arrays. arXiv preprint arXiv:2308.12506, 2023.
- The size of the sync basin. Chaos: An Interdisciplinary Journal of Nonlinear Science, 16(1), 2006.
- Noga Alon. Eigenvalues, geometric expanders, sorting in rounds, and ramsey theory. Combinatorica, 6(3):207–219, 1986.
- Expander flows, geometric embeddings and graph partitioning. Journal of the ACM (JACM), 56(2):1–37, 2009.
- Conductance and congestion in power law graphs. In Proceedings of the 2003 ACM SIGMETRICS International Conference on Measurement and modeling of computer systems, pages 148–159, 2003.
- On the locality of bounded growth. In Proceedings of the twenty-fourth annual ACM symposium on Principles of distributed computing, pages 60–68, 2005.
- The intrinsic dimensionality of graphs. In Proceedings of the thirty-fifth annual ACM symposium on Theory of computing, pages 438–447, 2003.
- Emmanuel Kowalski. An introduction to expander graphs. Société mathématique de France Paris, 2019.
- The dynamics of viral marketing. ACM Transactions on the Web (TWEB), 1(1):5–es, 2007.
- Aad W Van der Vaart. Asymptotic statistics, volume 3. Cambridge university press, 2000.
- Alessio Sancetta. Maximal inequalities for u-processes of strongly mixing random variables. Probability and Mathematical Statistics, 29, 2009.
- Deriving the asymptotic distribution of u-and v-statistics of dependent data using weighted empirical processes. 2012.
- Ken-ichi Yoshihara. Limiting behavior of u-statistics for stationary, absolutely regular processes. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, 35(3):237–252, 1976.
- Central limit theorems for empirical and u-processes of stationary mixing sequences. Journal of Theoretical Probability, 7(1):47–71, 1994.
- Rigorous statistical procedures for data from dynamical systems. Journal of Statistical Physics, 44:67–93, 1986.
- Sourav Chatterjee. Concentration inequalities with exchangeable pairs. Stanford University, 2005.