Program Evaluation with Remotely Sensed Outcomes (2411.10959v2)

Published 17 Nov 2024 in econ.EM, cs.LG, math.ST, stat.AP, stat.ME, stat.ML, and stat.TH

Abstract: Economists often estimate treatment effects in experiments using remotely sensed variables (RSVs), e.g. satellite images or mobile phone activity, in place of directly measured economic outcomes. A common practice is to use an observational sample to train a predictor of the economic outcome from the RSV, and then to use its predictions as the outcomes in the experiment. We show that this method is biased whenever the RSV is post-outcome, i.e. if variation in the economic outcome causes variation in the RSV. In program evaluation, changes in poverty or environmental quality cause changes in satellite images, but not vice versa. As our main result, we nonparametrically identify the treatment effect by formalizing the intuition that underlies common practice: the conditional distribution of the RSV given the outcome and treatment is stable across the samples.Based on our identifying formula, we find that the efficient representation of RSVs for causal inference requires three predictions rather than one. Valid inference does not require any rate conditions on RSV predictions, justifying the use of complex deep learning algorithms with unknown statistical properties. We re-analyze the effect of an anti-poverty program in India using satellite images.

Summary

The paper demonstrates that treating remotely sensed variables as post-outcome measures can bias treatment effect estimates.
It introduces a nonparametric identification method that leverages the stability of RSV distributions to accurately estimate average treatment effects without complex modeling.
The empirical application to an Indian public program validated its approach by replicating findings and saving approximately three million dollars in survey costs.

Analyzing the Use of Remotely Sensed Outcomes in Program Evaluation

The paper "Program Evaluation with Remotely Sensed Outcomes" by Ashesh Rambachan, Rahul Singh, and Davide Viviano challenges the current use of remotely sensed variables (RSVs) in program evaluation, emphasizing the inherent biases when these RSVs are used as post-outcome variables. Traditional evaluations often rely on surveys to measure economic outcomes, but this paper investigates the feasibility of replacing these costly methods with RSVs derived from mobile phone activity or satellite images to infer economic metrics such as living standards and environmental quality.

Key Contributions and Theoretical Insights

This research establishes that using RSVs—like night lights or satellite images—as outcome variables can produce biased treatment effect estimates when RSVs act as post-outcome variables. This scenario occurs because the RSV is improperly treated as an intermediary or mediator between treatment and eventual outcome, leading to significant distortions.

The authors introduce a method for nonparametric identification of the treatment effect. This is contingent on the condition that the conditional distribution of the RSV remains stable, regardless of changes in treatment or outcome. Such stability allows for leveraging variations in treatment and predicted outcomes to estimate the average treatment effect accurately. Crucially, this new method eliminates the need to specify or consistently estimate the relationship between the RSV, the outcome, and the treatment, a common obstacle given the complexity and unstructured nature of RSV data.

Methodological Framework and Empirical Application

To demonstrate their theoretical innovations, the authors reevaluated the impact of a large-scale public program in India by using satellite imagery to replicate its effects on local consumption and poverty without direct outcome observations in half of the experimental sample. This approach not only confirmed the original findings but also significantly reduced the research costs, conservatively saving approximately three million dollars by replacing expensive surveys with satellite-based proxies.

The methodological contribution lies in their unique nonparametric identification strategy, which avoids the biases of assuming RSVs as surrogates. Instead, they show that effective causal inference can be achieved by identifying the average treatment effect via simple conditional moments and efficiently constructing RSV representations that do not require a priori knowledge of the underlying complex relationships.

Implications and Future Directions

The implications of their findings underscore the need to reconsider how empirical researchers use RSVs in modeling treatment effects. The improperly employed use of RSVs as surrogates can lead to large biases, highlighting the potential pitfalls of empirical practices without robust theoretical backing. The authors’ approach opens pathways for further exploration into high-dimensional data in causal inference, which has applications ranging from environmental economics to public health, where ground-truth outcomes are difficult or expensive to collect.

Further directions include refining the methodology to extend beyond binary outcomes and integrating it with more advanced machine learning tools to improve the precision of RSV representations. The insights from this paper suggest that careful consideration of the causal pathways and the structuring of data representation can refine empirical estimations significantly, with broad implications for how program evaluations are conducted using remotely sensed data.