Undersmoothed LASSO Models for Propensity Score Weighting and Synthetic Negative Control Exposures for Bias Detection (2506.17760v1)
Abstract: The propensity score (PS) is often used to control for large numbers of covariates in high-dimensional healthcare database studies. The least absolute shrinkage and selection operator (LASSO) is a data-adaptive prediction algorithm that has become the most widely used tool for large-scale PS estimation in these settings. However, recent work has shown that the use of data-adaptive algorithms for PS estimation can come at the cost of slow convergence rates, resulting in PS-based causal estimators having poor statistical properties. While this can create challenges for the use of data-driven algorithms for PS analyses, both theory and simulations have shown that LASSO PS models can converge at a fast enough rate to provide asymptotically efficient PS weighted causal estimators. In order to achieve asymptotic efficiency, however, LASSO PS weighted estimators need to be properly tuned, which requires undersmoothing the fitted LASSO model. In this paper, we discuss challenges in determining how to undersmooth LASSO models for PS weighting and consider the use of balance diagnostics to select the degree of undersmoothing. Because no tuning criteria is universally best, we propose using synthetically generated negative control exposure studies to detect bias across alternative analytic choices. Specifically, we show that synthetic negative control exposures can identify undersmoothing techniques that likely violate partial exchangeability due to lack of control for measured confounding. We use a series of numerical studies to investigate the performance of alternative balance criteria to undersmooth LASSO PS-weighted estimators, and the use of synthetic negative control exposure studies to detect biased analyses.