Contamination Bias in Linear Regressions (2106.05024v5)

Published 9 Jun 2021 in econ.EM and stat.ME

Abstract: We study regressions with multiple treatments and a set of controls that is flexible enough to purge omitted variable bias. We show that these regressions generally fail to estimate convex averages of heterogeneous treatment effects -- instead, estimates of each treatment's effect are contaminated by non-convex averages of the effects of other treatments. We discuss three estimation approaches that avoid such contamination bias, including the targeting of easiest-to-estimate weighted average effects. A re-analysis of nine empirical applications finds economically and statistically meaningful contamination bias in observational studies; contamination bias in experimental studies is more limited due to smaller variability in propensity scores.

Citations (36)

View on Semantic Scholar

Summary

The paper demonstrates that regression estimates inadvertently mix treatment effects, leading to non-convex averages and contamination bias.
It derives specific conditions under which bias arises, emphasizing mis-specification in linear propensity score models as a key factor.
By proposing methods like unweighted ATEs, EW, and CW schemes, the study offers practical solutions for robust causal inference.

Understanding Contamination Bias in Linear Regressions

The paper "Contamination Bias in Linear Regressions" by Goldsmith-Pinkham, Hull, and Kolesár explores a critical issue in the estimation of treatment effects using linear regression models that include multiple treatments. This issue, termed "contamination bias," arises when the regression estimates meant to capture the effect of a particular treatment inadvertently incorporate the effects of other treatments, thus complicating the interpretation of these estimates.

Core Arguments and Findings

The central thesis of the paper is that multiple-treatment regressions generally fail to estimate convex averages of heterogeneous treatment effects. Instead, the resulting estimates are often polluted by non-convex averages which amalgamate the effects of other treatments. This finding challenges the expectation rooted in Angrist's (1998) work that regressions on a single binary treatment can adequately estimate a convex average of treatment effects if the binary treatment is conditionally as good as randomly assigned.

The authors provide a comprehensive characterization of contamination bias, illustrating that even when omitted variable bias (OVB) is theoretically addressed, the regression estimates are confounded by non-linear dependencies between included treatments. Even under conditions where treatment assignment is as good as random, linear regression adjustments struggle to isolate the effect of any single treatment.

A significant contribution of the paper is the derivation of conditions under which contamination bias manifests. They demonstrate that if the treatment assignments are mutually exclusive, and linear propensity score models are incorrectly specified, then regression estimates could be substantially biased. The authors show that the own-treatment weights are non-negative only under specific conditions — mainly when linearity and separability in the treatment assignment model are satisfied.

Proposed Solutions

To address contamination bias, the paper discusses three methodologies:

Estimation of Unweighted Average Treatment Effects (ATEs): This method hinges on strong overlap, ensuring that each treatment is sufficiently represented across different covariate strata. Methods like inverse propensity score weighting or doubly-robust estimators are recommended for robust ATE estimation.
Easiest-to-Estimate Weighting (EW): This approach minimizes the semiparametric efficiency bound, providing a weighting scheme that is convex and tailored to yield precise estimates under homogeneous assumptions. It is robust to limited overlap, making it preferable for observational data where extreme propensity scores are common.
Common Weighting (CW) Scheme: Facilitating comparisons across treatment arms, this approach applies weights that remain consistent across treatments, designed to optimize precision when measuring contrasts under a specified distribution of treatment interest.

Practical Implications and Empirical Evidence

Goldsmith-Pinkham et al. apply their theoretical insights to nine empirical studies, which include randomized control trials (RCTs) and observational datasets. They uncover significant contamination bias in several instances, particularly in observational studies where treatment assignments are heavily stratified. The empirical results underscore the critical need for awareness and adjustment for contamination bias to ensure credible causal inference.

For practitioners and policymakers, these findings highlight the precarious nature of treatment effect estimates in the face of complex, multi-treatment environments. In particular, observational studies with limited overlap demand careful consideration of ATE estimation techniques, influenced starkly by the chosen weighing strategy.

Future Directions

The paper opens avenues for future research in more refined methods that account for heterogeneous treatment responses in complex experimental and non-experimental settings. The insights call for a broader inquiry into flexible modeling techniques that successfully circumvent contamination while preserving interpretability.

Ultimately, the discussion presented by Goldsmith-Pinkham, Hull, and Kolesár advances the econometric discourse on treatment effect estimation, proffering a crucial diagnostic lens on bias proliferation in linear models. As empirical strategies continue to evolve, understanding and mitigating contamination bias will remain a cornerstone for achieving robust, policy-relevant insights.

PDF Markdown

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Related Papers

Find Related Papers

Authors (3)

Tweets

https://twitter.com/causalinf/status/1822308188042956992

https://twitter.com/eBlogs/status/1758388383691247675