Treatment Effect Heterogeneity and Importance Measures for Multivariate Continuous Treatments (2404.09126v1)
Abstract: Estimating the joint effect of a multivariate, continuous exposure is crucial, particularly in environmental health where interest lies in simultaneously evaluating the impact of multiple environmental pollutants on health. We develop novel methodology that addresses two key issues for estimation of treatment effects of multivariate, continuous exposures. We use nonparametric Bayesian methodology that is flexible to ensure our approach can capture a wide range of data generating processes. Additionally, we allow the effect of the exposures to be heterogeneous with respect to covariates. Treatment effect heterogeneity has not been well explored in the causal inference literature for multivariate, continuous exposures, and therefore we introduce novel estimands that summarize the nature and extent of the heterogeneity, and propose estimation procedures for new estimands related to treatment effect heterogeneity. We provide theoretical support for the proposed models in the form of posterior contraction rates and show that it works well in simulated examples both with and without heterogeneity. We apply our approach to a study of the health effects of simultaneous exposure to the components of PM$_{2.5}$ and find that the negative health effects of exposure to these environmental pollutants is exacerbated by low socioeconomic status and age.
- A systematic comparison of linear regression–based statistical methods to assess exposome-health associations. Environmental health perspectives, 124(12):1848–1856, 2016.
- Causal analysis of air pollution mixtures: Estimands, positivity, and extrapolation. arXiv preprint arXiv:2401.17385, 2024.
- Estimating the health effects of environmental mixtures using bayesian semiparametric regression and sparsity inducing priors. The Annals of Applied Statistics, 14(1):257–275, 2020.
- Recursive partitioning for heterogeneous causal effects. Proceedings of the National Academy of Sciences, 113(27):7353–7360, 2016.
- Efficient gaussian process regression for large datasets. Biometrika, 100(1):75–89, 2013.
- Doubly robust estimation in missing data and causal inference models. Biometrics, 61(4):962–973, 2005.
- Causal rule ensemble: Interpretable discovery and inference of heterogeneous causal effects. arXiv preprint arXiv:2009.09036, 2020.
- Bayesian kernel machine regression for estimating the health effects of multi-pollutant mixtures. Biostatistics, 16(3):493–508, 2015.
- A hierarchical integrative group least absolute shrinkage and selection operator for analyzing environmental mixtures. Environmetrics, 32(8):e2698, 2021.
- Leo Breiman. Random forests. Machine learning, 45(1):5–32, 2001.
- Characterization of weighted quantile sum regression for highly correlated data in a risk analysis setting. Journal of agricultural, biological, and environmental statistics, 20(1):100–120, 2015.
- Individualized multi-treatment response curves estimation using rbf-net with shared neurons. arXiv preprint arXiv:2401.16571, 2024.
- Bart: Bayesian additive regression trees. The Annals of Applied Statistics, 4(1):266–298, 2010.
- An ensemble-based model of pm2.5 concentration across the contiguous united states with high spatiotemporal resolution. Environment International, 130:104909, 2019.
- Daily and annual pm2. 5 concentrations for the contiguous united states, 1-km grids, v1 (2000-2016). NASA Socioeconomic Data and Applications Center (SEDAC), 2021. doi: https://doi.org/10.7927/0rvr-4538.
- Protecting human health from air pollution: shifting from a single-pollutant to a multi-pollutant approach. Epidemiology (Cambridge, Mass.), 21(2):187, 2010.
- Automated versus do-it-yourself methods for causal inference: Lessons learned from a data analysis competition. Statistical Science, 34(1):43–68, 2019.
- Estimation of conditional average treatment effects with high-dimensional data. Journal of Business & Economic Statistics, 40(1):313–327, 2022.
- Identifying main effects and interactions among exposures using gaussian processes. The annals of applied statistics, 14(4):1743, 2020.
- Bayesian factor analysis for inference on interactions. Journal of the American Statistical Association, 116(535):1521–1532, 2021.
- Subhashis Ghosal and Aad Van Der Vaart. Convergence rates of posterior distributions for noniid observations. 2007.
- Complex mixtures, complex analyses: an emphasis on interpretable results. Current environmental health reports, 6(2):53–61, 2019.
- Bayesian regression tree models for causal inference: Regularization, confounding, and heterogeneous effects (with discussion). Bayesian Analysis, 15(3):965–1056, 2020.
- Amy H Herring. Nonparametric bayes shrinkage for assessing exposures to mixtures subject to limits of detection. Epidemiology (Cambridge, Mass.), 21(Suppl 4):S71, 2010.
- Bayesian additive regression trees: A review and look forward. Annual Review of Statistics and Its Application, 7(1), 2020.
- Variable importance measures for heterogeneous causal effects. arXiv preprint arXiv:2204.06030, 2022.
- A fundamental measure of treatment effect heterogeneity. Journal of Causal Inference, 9(1):83–108, 2021.
- Targeted learning on variable importance measure for heterogeneous treatment effect. arXiv preprint arXiv:2309.13324, 2023.
- Adaptive conditional distribution estimation with bayesian decision tree ensembles. Journal of the American Statistical Association, pages 1–14, 2022.
- Antonio R Linero. A review of tree-based bayesian methods. Communications for Statistical Applications and Methods, 24(6):543–559, 2017.
- Bayesian regression tree ensembles that adapt to smoothness and sparsity. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 80(5):1087–1110, 2018.
- Selection of nonlinear interactions by a forward stepwise algorithm: Application to identifying environmental chemical mixtures affecting health outcomes. Statistics in medicine, 38(9):1582–1600, 2019.
- Daily 8-hour maximum and annual o3 concentrations for the contiguous united states, 1-km grids, v1 (2000-2016). NASA Socioeconomic Data and Applications Center (SEDAC), 2021.
- The central role of the propensity score in observational studies for causal effects. Biometrika, 70(1):41–55, 1983.
- Donald B Rubin. Randomization analysis of experimental data: The fisher randomization test comment. Journal of the American statistical association, 75(371):591–593, 1980.
- Estimation and false discovery control for the analysis of environmental mixtures. Biostatistics, 23(4):1039–1055, 2022.
- Debiased machine learning of conditional average treatment effects and other causal functions. The Econometrics Journal, 24(2):264–289, 2021.
- Improved inference for doubly robust estimators of heterogeneous treatment effects. Biometrics, 00:1–13, 2023.
- A spatial interference approach to account for mobility in air pollution studies with multivariate continuous treatments. arXiv preprint arXiv:2305.14194, 2023.
- Adverse effects of outdoor pollution in the elderly. Journal of thoracic disease, 7(1):34, 2015.
- Statistical approaches to address multi-pollutant mixtures and multiple exposures: the state of the science. Current environmental health reports, 4(4):481–490, 2017.
- Bart with targeted smoothing: An analysis of patient-specific stillbirth risk. The Annals of Applied Statistics, 14(1):28–50, 2020.
- Regional estimates of chemical composition of fine particulate matter using a combined geoscience-statistical method with information from satellites, models, and monitors. Environmental science & technology, 53(5):2595–2611, 2019.
- Estimation and inference of heterogeneous treatment effects using random forests. Journal of the American Statistical Association, 113(523):1228–1242, 2018.
- The impact of long-term pm2. 5 exposure on specific causes of death: exposure-response curves and effect modification among 53 million us medicare beneficiaries. Environmental Health, 19(1):1–12, 2020.
- Sparse bayesian additive nonparametric regression with application to health effects of pesticides mixtures. Statistica Sinica, 30(1):55–79, 2020.
- Nonparametric variable importance assessment using machine learning techniques. Biometrics, 77(1):9–22, 2021.
- Lu Zhang and Lucas Janson. Floodgate: inference for model-free variable importance. arXiv preprint arXiv:2007.01283, 2020.
- Bayesian inference and partial identification in multi-treatment causal inference with unobserved confounding. arXiv preprint arXiv:2111.07973, 2021.