Demystifying and avoiding the OLS "weighting problem": Unmodeled heterogeneity and straightforward solutions (2403.03299v4)
Abstract: Researchers frequently estimate treatment effects by regressing outcomes (Y) on treatment (D) and covariates (X). Even without unobserved confounding, the coefficient on D yields a conditional-variance-weighted average of strata-wise effects, not the average treatment effect. Scholars have proposed characterizing the severity of these weights, evaluating resulting biases, or changing investigators' target estimand to the conditional-variance-weighted effect. We aim to demystify these weights, clarifying how they arise, what they represent, and how to avoid them. Specifically, these weights reflect misspecification bias from unmodeled treatment-effect heterogeneity. Rather than diagnosing or tolerating them, we recommend avoiding the issue altogether, by relaxing the standard regression assumption of "single linearity" to one of "separate linearity" (of each potential outcome in the covariates), accommodating heterogeneity. Numerous methods--including regression imputation (g-computation), interacted regression, and mean balancing weights--satisfy this assumption. In many settings, the efficiency cost to avoiding this weighting problem altogether will be modest and worthwhile.
- Large Sample Properties of Matching Estimators for Average Treatment Effects. Econometrica, 74(1):235–267. _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1468-0262.2006.00655.x.
- Angrist, J. D. (1998). Estimating the Labor Market Impact of Voluntary Military Service Using Social Security Data on Military Applicants. Econometrica, 66(2):249–288. Publisher: Econometric Society.
- Empirical strategies in labor economics. In Handbook of Labor Economics, volume 3, Part A, pages 1277–1366. Elsevier.
- Mostly Harmless Econometrics: An Empiricist’s Companion. Princeton University Press.
- Does Regression Produce Representative Estimates of Causal Effects? American Journal of Political Science, 60(1):250–267.
- Declaring and diagnosing research designs. American Political Science Review, 113(3):838–859.
- lmw: Linear Model Weights for Causal Inference. arXiv:2303.08790 [stat].
- On the implied weights of linear regression for causal inference. Biometrika, 110(3):615–629.
- Average and Quantile Effects in Nonseparable Panel Models - Chernozhukov - 2013 - Econometrica - Wiley Online Library.
- Partial time regressions as compared with individual trends. Econometrica, 1(4):387–401.
- Analyzing Experimental Data Using Regression: When is Bias a Practical Concern?
- Hainmueller, J. (2012). Entropy Balancing for Causal Effects: A Multivariate Reweighting Method to Produce Balanced Samples in Observational Studies. Political Analysis, 20(1):25–46. Publisher: Cambridge University Press.
- Hazlett, C. (2020). Kernel balancing. Statistica Sinica, 30(3):1155–1189.
- Hoffmann, N. I. (2023). Double robust, flexible adjustment methods for causal inference: An overview and an evaluation. URL: https://osf.io/preprints/socarxiv/dzayg.
- Humphreys, M. (2009). Bounds on least squares estimates of causal effects in the presence of heterogeneous assignment probabilities.
- Demystifying Double Robustness: A Comparison of Alternative Strategies for Estimating a Population Mean from Incomplete Data. Statistical Science, 22(4):523–539. Publisher: Institute of Mathematical Statistics.
- Adjusting experimental data: Models versus design. In APSA 2010 Annual Meeting Paper.
- Metalearners for estimating heterogeneous treatment effects using machine learning. Proceedings of the National Academy of Sciences, 116(10):4156–4165. Publisher: Proceedings of the National Academy of Sciences.
- Lin, W. (2013). Agnostic notes on regression adjustments to experimental data: Reexamining Freedman’s critique. The Annals of Applied Statistics, 7(1):295–318. Publisher: Institute of Mathematical Statistics.
- Lovell, M. C. (1963). Seasonal adjustment of economic time series and multiple regression analysis. Journal of the American Statistical Association, 58(304):993–1010.
- Neyman, J. (1923). Sur les applications de la théorie des probabilités aux experiences agricoles: Essai des principes. Roczniki Nauk Rolniczych, 10(1):1–51.
- Robins, J. (1986). A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect. Mathematical Modelling, 7(9):1393–1512.
- Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of educational Psychology, 66(5):688.
- Implementation of G-Computation on a Simulated Data Set: Demonstration of a Causal Inference Technique. American Journal of Epidemiology, 173(7):731–738.
- Słoczyński, T. (2022). Interpreting OLS Estimands When Treatment Effects Are Heterogeneous: Smaller Groups Get Larger Weights. The Review of Economics and Statistics, pages 1–9.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.