A Hierarchical Framework for Correcting Under-Reporting in Count Data (1809.00544v1)

Published 3 Sep 2018 in stat.AP

Abstract: Tuberculosis poses a global health risk and Brazil is among the top twenty countries by absolute mortality. However, this epidemiological burden is masked by under-reporting, which impairs planning for effective intervention. We present a comprehensive investigation and application of a Bayesian hierarchical approach to modelling and correcting under-reporting in tuberculosis counts, a general problem arising in observational count data. The framework is applicable to fully under-reported data, relying only on an informative prior distribution for the mean reporting rate to supplement the partial information in the data. Covariates are used to inform both the true count generating process and the under-reporting mechanism, while also allowing for complex spatio-temporal structures. We present several sensitivity analyses based on simulation experiments to aid the elicitation of the prior distribution for the mean reporting rate and decisions relating to the inclusion of covariates. Both prior and posterior predictive model checking are presented, as well as a critical evaluation of the approach.

Citations (53)

View on Semantic Scholar

Collections

Summary

Overview of a Hierarchical Framework for Correcting Under-Reporting in Count Data

The paper "A Hierarchical Framework for Correcting Under-Reporting in Count Data" by Oliver Stoner, Theo Economou, and Gabriela Drummond presents an advanced statistical methodology to address the challenge of under-reporting in count data, focusing on tuberculosis (TB) incidence in Brazil. The paper uses a Bayesian hierarchical framework to tackle under-reporting, a pervasive issue that can distort statistical inferences and lead to misallocated resources in public health interventions.

The authors propose a flexible hierarchical model that accounts for under-reported counts by incorporating an informative prior distribution for the mean reporting rate. This methodology diverges from traditional censored likelihood approaches by estimating the severity of under-reporting through the reporting probability. The model also integrates covariates related to the true count generating process and the under-reporting mechanism, enabling comprehensive predictive analysis of true incidence rates.

Strong Numerical Results and Claims

One of the core claims of the paper is the ability of the Bayesian hierarchical framework to accurately quantify and correct under-reporting, thus improving the predictive reliability of true count data. The framework's flexibility allows for complete predictive distributions of true counts, offering insight into the uncertainties associated with correcting under-reporting biases. In particular, the model illustrates that areas with low TB treatment timeliness have significantly reduced reporting probabilities.

Practical and Theoretical Implications

The implications of this research are substantial for epidemiological studies and public health policy. By accurately characterizing under-reporting, especially in regions with varying socio-economic conditions, the model provides policymakers with refined data, enabling the more efficient allocation of resources for surveillance and intervention. The framework could be adapted to other regions or diseases where under-reporting is evident, providing a systematic method for assessing and addressing data biases.

Theoretically, this work contributes to the statistical modeling literature by offering a robust approach to handling and correcting under-reported data. By extending previous methodologies with a hierarchical count framework that includes a logistic relationship for reporting probabilities, the paper enhances the ability to characterize the uncertainty inherent in count data.

Future Developments

The paper suggests avenues for further research, notably the exploration of Bayesian model averaging to address the uncertainty in covariate classification between under-reporting and count generating processes more rigorously. Additionally, the development of tools for eliciting informative priors, perhaps combining empirical data from validation studies, could improve the robustness of future applications of the model.

Given that the horizon of artificial intelligence includes the integration of statistical models with broader data science frameworks, future models could leverage AI techniques to automate the identification and correction of under-reporting across large datasets, thereby accelerating the process of obtaining reliable data for decision-making.

In conclusion, this paper provides a detailed statistical framework for managing under-reporting in count data, focusing on TB incidence in Brazil, with significant implications for public health and data analysis methodologies. The hierarchical model demonstrates substantial promise in enhancing our understanding and management of epidemiological data challenges.