Modelling Under-Reported Data: Pitfalls of Naïve Approaches and a New Statistical Framework for Epidemic Curve Reconstruction
Abstract: Count-valued autoregressions are widely used to analyse time-series of reported infectious-disease cases because of their close connection with discrete-time transmission models. However, when such models are applied directly to under-reported case counts, their mechanistic interpretation can break down. We establish new theoretical results quantifying the consequences of ignoring under-reporting in these models. To address this issue, reported cases are often modelled as a binomially thinned version of an underlying count process, but such models are difficult to fit because the unobserved true counts are serially correlated and integer-valued. We develop a new statistical framework for under-reported infectious-disease data that uses a normal-normal approximation to a broad class of thinned count autoregressions and then accurately maps this continuous process back to the integers. Through simulations and applications to rotavirus incidence in a German state and Covid-19 incidence in English conurbations, we demonstrate that our approach both retains the mechanistic appeal of thinned autoregressions and substantially simplifies inference.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.