Detecting Influenza Epidemics on Twitter (2111.10675v1)
Abstract: This paper presents a predictive model for Influenza-Like-Illness, based on Twitter traffic. We gather data from Twitter based on a set of keywords used in the Influenza wikipedia page, and perform feature selection over all words used in 3 years worth of tweets, using real ILI data from the Greek CDC. We select a small set of words with high correlation to the ILI score, and train a regression model to predict the ILI score cases from the word features. We deploy this model on a streaming application and feed the resulting time-series to FluHMM, an existing prediction model for the phases of the epidemic. We find that Twitter traffic offers a good source of information and can generate early warnings compared to the existing sentinel protocol using a set of associated physicians all over Greece.