Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 71 tok/s

Gemini 2.5 Pro 54 tok/s Pro

GPT-5 Medium 22 tok/s Pro

GPT-5 High 29 tok/s Pro

GPT-4o 88 tok/s Pro

Kimi K2 138 tok/s Pro

GPT OSS 120B 446 tok/s Pro

Claude Sonnet 4.5 35 tok/s Pro

2000 character limit reached

Dealing with Logs and Zeros in Regression Models (2203.11820v1)

Published 22 Mar 2022 in econ.EM and stat.ME

Abstract: Log-linear models are prevalent in empirical research. Yet, how to handle zeros in the dependent variable remains an unsettled issue. This article clarifies it and addresses the log of zero by developing a new family of estimators called iterated Ordinary Least Squares (iOLS). This family nests standard approaches such as log-linear and Poisson regressions, offers several computational advantages, and corresponds to the correct way to perform the popular $\log(Y+1)$ transformation. We extend it to the endogenous regressor setting (i2SLS) and overcome other common issues with Poisson models, such as controlling for many fixed-effects. We also develop specification tests to help researchers select between alternative estimators. Finally, our methods are illustrated through numerical simulations and replications of landmark publications.

Citations (51)

View on Semantic Scholar

Summary

The paper introduces the iOLS estimator, which iteratively adjusts zero values to enable logarithmic transformation and reduce bias in log-linear models.
It outlines a model selection procedure with specification tests to guide researchers in choosing the most suitable estimator based on data characteristics.
Simulations and empirical applications highlight iOLS's robust performance compared to conventional fixes like adding constants or relying on Poisson regression.

Exploring Solutions for Zeros in Log-Linear Regression Models

The paper, "Dealing with Logs and Zeros in Regression Models," by Bell, Benatia, and Pape, provides a detailed investigation into the prevalent issue of handling zeros in log-linear regression models. Log-linear models are widely used in empirical research due to their interpretability and mathematical properties. However, the presence of zero values in the dependent variable poses a significant challenge since the logarithmic transformation is undefined for non-positive numbers. This paper introduces a novel solution to this issue, proposing a new family of estimators called Iterated Ordinary Least Squares (iOLS), alongside a model selection process to guide researchers in choosing appropriate models.

Overview of the Problem

The challenge of dealing with zeros in log-linear models is common in empirical research, yet consensus on the best approach remains elusive. Typically, several strategies are employed:

Adding a Constant (The Popular Fix): Adding a small positive constant to the dependent variable allows for the logarithm to be calculated, though this can introduce bias.
Discarding Zeros: This approach can lead to selection bias unless specific conditions about the data generation process are met.
Poisson Models: While robust to zeros, these models rely on strict assumptions regarding the distribution of errors.
Inverse Hyperbolic Sine (IHS) Transformation: This alternative transformation accommodates zeros but may complicate elasticity interpretation.
Mixture Models: These address zero observations through sample selection models but are less commonly used due to complexity.

Iterated Ordinary Least Squares (iOLS)

To overcome these limitations, the paper proposes iOLS, a flexible and computationally efficient estimator. iOLS operates by iteratively adding an observation-specific positive value to the response variable before applying the log transformation. Crucially, iOLS nests standard methods such as the log-linear model and Poisson regression, adapting to various data structures. The key innovation in iOLS is the transformation of the dependent variable into a weighted average that mitigates the bias induced by zero observations.

The general algorithm for iOLS involves initializing the procedure with an estimator such as $\log(Y+1)$ , iterating least squares regression by updating the transformed dependent variable, and continually updating the parameter estimates until convergence.

Model Selection Procedure and Specification Tests

The paper emphasizes the necessity of model validation through the development of specification tests tailored to assess the fit and assumptions of different estimators concerning the occurrence of zeros. iOLS incorporates specification tests that evaluate the external validity of the models, enabling researchers to identify when Poisson or log-linear conditions (or other assumptions) are appropriate.

Additionally, the model selection process leverages these specification tests, allowing researchers to choose the hyper-parameter in iOLS that aligns closest with the observed data patterns. This model selection seeks to balance bias and variance, ensuring robust elasticity or semi-elasticity estimation in applied settings.

Applications and Simulations

The paper provides extensive simulations illustrating the performance of iOLS across different data-generating processes, comparing it with existing methods such as Poisson regression and the 'popular fix'. The simulations demonstrate the versatility of iOLS in delivering consistent estimates across various scenarios, particularly when facing challenges like heteroskedasticity and correlation between zeros and positive values.

Moreover, the paper showcases empirical applications in high-impact economic studies, highlighting the flexibility and practical advantages of using iOLS. These include handling zeros in datasets involving international trade volumes and regional development indicators—fields where zero observations are prevalent.

Conclusion

This contribution offers a comprehensive solution for the ongoing challenge of zeros in log-linear regression models. By introducing iOLS, a flexible estimator, and coupling it with an innovative model selection procedure, the authors provide the empirical toolkit necessary for robust econometric analysis in the presence of zero observations. The paper not only advances the theoretical understanding but also offers practical solutions with clear guidance for future research development in econometrics and applied fields.