On the implied weights of linear regression for causal inference

Published 14 Apr 2021 in stat.ME | (2104.06581v4)

Abstract: A basic principle in the design of observational studies is to approximate the randomized experiment that would have been conducted under controlled circumstances. Now, linear regression models are commonly used to analyze observational data and estimate causal effects. How do linear regression adjustments in observational studies emulate key features of randomized experiments, such as covariate balance, self-weighted sampling, and study representativeness? In this paper, we provide answers to this and related questions by analyzing the implied (individual-level data) weights of linear regression methods. We derive new closed-form expressions of the weights and examine their properties in both finite and asymptotic regimes. We show that the implied weights of general regression problems can be equivalently obtained by solving a convex optimization problem. Among others, we study doubly and multiply robust properties of regression estimators from the perspective of their implied weights. This equivalence allows us to bridge ideas from the regression modeling and causal inference literatures. As a result, we propose novel regression diagnostics for causal inference that are part of the design stage of an observational study. As special cases, we analyze the implied weights in common settings such as multi-valued treatments and regression adjustment after matching. We implement the weights and diagnostics in the new lmw package for R.