Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Predicting antimicrobial drug consumption using web search data (1803.03532v1)

Published 9 Mar 2018 in cs.IR

Abstract: Consumption of antimicrobial drugs, such as antibiotics, is linked with antimicrobial resistance. Surveillance of antimicrobial drug consumption is therefore an important element in dealing with antimicrobial resistance. Many countries lack sufficient surveillance systems. Usage of web mined data therefore has the potential to improve current surveillance methods. To this end, we study how well antimicrobial drug consumption can be predicted based on web search queries, compared to historical purchase data of antimicrobial drugs. We present two prediction models (linear Elastic Net, and non-linear Gaussian Processes), which we train and evaluate on almost 6 years of weekly antimicrobial drug consumption data from Denmark and web search data from Google Health Trends. We present a novel method of selecting web search queries by considering diseases and drugs linked to antimicrobials, as well as professional and layman descriptions of antimicrobial drugs, all of which we mine from the open web. We find that predictions based on web search data are marginally more erroneous but overall on a par with predictions based on purchases of antimicrobial drugs. This marginal difference corresponds to $<1$\% point mean absolute error in weekly usage. Best predictions are reported when combining both web search and purchase data. This study contributes a novel alternative solution to the real-life problem of predicting (and hence monitoring) antimicrobial drug consumption, which is particularly valuable in countries/states lacking centralised and timely surveillance systems.

Citations (7)

Summary

  • The paper presents a novel method leveraging web search data alongside historical sales to predict antimicrobial drug usage.
  • It compares two models—linear Elastic Net and non-linear Gaussian Processes—with Elastic Net outperforming when autoregressive terms are added.
  • Enhanced prediction accuracy was achieved by extending lag periods, highlighting the impact of long-term seasonal trends in drug consumption.

Predicting Antimicrobial Drug Consumption Using Web Search Data

In the field of public health, antibiotic resistance poses a significant challenge. It arises largely due to the overuse and inappropriate use of antimicrobial drugs, such as antibiotics. Effective surveillance of antimicrobial drug consumption is essential to mitigate this issue. However, many countries lack detailed and timely surveillance systems. This paper by Hansen et al. investigates an innovative approach to predicting antimicrobial drug consumption using web search data, providing an alternative to traditional surveillance methods based on historical purchase data.

Data and Methodology

The paper leverages three primary data sources: antimicrobial drug sales data from Denmark, web search query frequency data from the Google Health Trends API, and textual data related to antimicrobials available online. The sales data includes weekly antimicrobial drug purchases over several years, which serves as the ground truth for consumption rates. The search query data covers the same period and is used to identify potential indicators of drug consumption behavior.

The authors present two prediction models: a linear Elastic Net model and a non-linear Gaussian Processes model. Elastic Net is chosen for its balance between L1 and L2 regularization, which makes it suitable for high-dimensional data where the number of predictors can be very large. Gaussian Processes, on the other hand, are ideal for capturing non-linear relationships in the data.

To select relevant search queries, the paper mines various online sources, including layman and professional descriptions of diseases and drugs linked to antimicrobials. The selected queries are used to construct time series data, which are then fed into the prediction models.

Key Findings

The performance of the prediction models is evaluated against several criteria, including Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE). Prediction offsets of 0, 4, 8, and 12 weeks are tested to assess the models' ability to forecast future consumption.

Main Results:

  1. Comparison of Data Sources:
    • Predictions based solely on web search data were found to be marginally more erroneous but generally comparable to those based on historical sales data.
    • The combined use of web search data and historical purchase data yielded the most accurate predictions, with an average error difference of less than 1% point MAE.
  2. Prediction Models:
    • Elastic Net generally outperformed Gaussian Processes, particularly when autoregressive terms were included, likely due to the linear nature of the relationships captured during query selection.
    • Gaussian Processes, while less effective overall, still provided valuable insights, especially when longer-term, non-linear trends were considered.
  3. Query Sets and Lags:
    • The best performance was observed when using descriptions of antimicrobial drugs targeted at laymen. This may be due to the broad nature of these descriptions, capturing a wide array of correlated search behaviors.
    • Increasing the maximum lag period from 26 to 130 weeks generally improved prediction accuracy, indicating that long-term seasonal trends play a crucial role in drug consumption patterns.

Implications and Future Work

The paper demonstrates that web search data can serve as a viable proxy for antimicrobial drug consumption. This is particularly valuable for regions lacking comprehensive surveillance systems. The potential to predict drug consumption using readily available web data could expedite the implementation of monitoring systems, aiding in the timely identification of misuse and aiding in public health planning.

The results suggest that, while web data alone is slightly less accurate than historical data, the discrepancy is minimal. This implies that in practice, combining web and historical data can produce robust and reliable predictions. The approach of mining online resources for query selection seems promising and could be applied to other public health surveillance tasks.

Future Directions

Future research could explore several avenues:

  • Extend to Other Regions: Extending this methodology to other regions and comparing results could provide further validation.
  • Real-time Monitoring: Implementing real-time monitoring systems based on web data could provide instant insights into drug consumption patterns.
  • Incorporation of More Data Sources: Incorporating additional data sources, such as social media activity and real-time health reports, could enhance the predictive power of the models.
  • Advanced Model Integration: Exploring more advanced machine learning models and integrating them into existing public health frameworks could refine prediction accuracy further.

In conclusion, this paper presents a rigorous approach to predicting antimicrobial drug consumption using web search data, offering a complementary tool for public health surveillance and intervention in the face of rising antimicrobial resistance.