Machine Learning Methods Economists Should Know About

Published 24 Mar 2019 in econ.EM and stat.ML | (1903.10075v1)

Abstract: We discuss the relevance of the recent Machine Learning (ML) literature for economics and econometrics. First we discuss the differences in goals, methods and settings between the ML literature and the traditional econometrics and statistics literatures. Then we discuss some specific methods from the machine learning literature that we view as important for empirical researchers in economics. These include supervised learning methods for regression and classification, unsupervised learning methods, as well as matrix completion methods. Finally, we highlight newly developed methods at the intersection of ML and econometrics, methods that typically perform better than either off-the-shelf ML or more traditional econometric methods when applied to particular classes of problems, problems that include causal inference for average treatment effects, optimal policy estimation, and estimation of the counterfactual effect of price changes in consumer choice models.

Abstract PDF Upgrade to Chat

Citations (607)

View on Semantic Scholar

Summary

The paper demonstrates that combining machine learning and econometric techniques improves prediction accuracy and causal estimation.
It reviews supervised and unsupervised methods like regularization, tree ensembles, and clustering to address overfitting and complex data patterns.
It shows that integrating these ML approaches enhances policy evaluation and equips researchers to handle high-dimensional data challenges.

Summary of "Machine Learning Methods Economists Should Know About"

The paper "Machine Learning Methods Economists Should Know About" by Susan Athey and Guido W. Imbens examines the intersection of ML techniques and econometrics. The authors aim to familiarize economists with ML advancements relevant to empirical research and argue that incorporating these methodologies can enhance econometric analyses.

Key Distinctions and Integrations

The authors begin by contrasting the structural approach inherent in econometrics with the algorithmic orientation of ML. They note that traditionally, econometrics focuses on parameter estimation and hypothesis testing, while ML emphasizes prediction accuracy and handling large datasets. This distinction underpins the paper's central argument: economists should integrate ML's predictive capabilities with econometric causal inference to achieve robust data-driven insights.

Machine Learning Techniques

The paper reviews several ML techniques deemed beneficial for economists:

Supervised Learning for Regression and Classification: The authors discuss regularization methods such as LASSO and ridge regression, which address issues like overfitting and multicollinearity. They also highlight the effectiveness of tree-based methods and ensembles, such as random forests and boosting, in providing prediction flexibility.
Unsupervised Learning: The paper outlines clustering techniques like k-means and topic modeling as tools for identifying patterns without explicit labels, potentially useful in observational data with complex structures.
Matrix Completion: Recognizing its relevance to panel data, the authors suggest matrix completion methods for imputing missing data, linking these techniques to traditional econometric models.
Text Analysis: Innovations in text processing, particularly through word embeddings, offer novel ways to extract and analyze qualitative data, providing economists with tools to integrate qualitative insights into quantitative analyses.

Econometrics Meets Machine Learning

In econometrics, careful attention is paid to causal inference—a domain where ML methods have started to yield new approaches:

Average Treatment Effects (ATE): The authors emphasize the importance of doubly robust and orthogonalization techniques, suggesting that ML methods can enhance causal estimates by more effectively handling control variables.
Heterogeneous Treatment Effects: The paper outlines methods like causal forests for exploring treatment effect heterogeneity, which are crucial for designing targeted policies.
Policy Estimation: It discusses the pursuit of optimal decision rules via policy learning, drawing on the concept of regret minimization within the context of A/B testing and multi-armed bandits.

Implications and Future Directions

The authors encourage econometricians to adopt ML techniques, not only for their predictive capabilities but also for their potential to refine causal estimates by integrating high-dimensional data structures and accounting for complex interactions. By adapting econometric training to include ML, the authors envision a future where empirical research is more robust, data-driven, and interdisciplinary.

The paper concludes that embracing ML methodologies will equip economists to better analyze large datasets, address novel research questions, and collaboratively interact across various scientific domains. This integration signals a transformative shift in the empirical research landscape, emphasizing adaptability and cross-disciplinary influences.