- The paper demonstrates that sudden spikes in search query volumes can predict FDA drug recalls with an AUC of up to 0.791.
- It employs detailed time-series feature engineering and a bagging approach with k-means clustering to identify early warning signals.
- The findings support using internet search data as a real-time, cost-effective supplement to traditional pharmacovigilance systems.
Predicting Drug Recalls from Internet Search Engine Queries
The paper "Predicting drug recalls from Internet search engine queries" by Elad Yom-Tov investigates the potential of leveraging search data for the early detection of defective pharmaceutical batches. This paper aims to explore whether search query trends can serve as precursors to FDA drug recall events, thereby enhancing early warning systems for the detection of faulty drugs.
The primary material for this research consists of search queries from the Bing search engine across the USA in 2015. These were compared with recall notifications issued by the FDA within the same year. The dataset included 5,195 pharmaceutical drugs and their mentions in search queries, focusing on attributes such as the volume and change rate of queries at the state level to predict impending recalls.
Methodology
Data Extraction and Filtering
The data was meticulously extracted from Bing's 2015 search logs, capturing anonymized user identifiers, query text, date, and the US state of origin. Queries mentioning any of 5,195 drugs were filtered, narrowing down to 373 drugs with at least 1,000 mentions. The paper used the FDA's Recall Enterprise System (RES) as ground truth, categorizing recalls by state, date, and drug, along with classification into Class I, II, or III recalls.
Feature Engineering
The core methodology revolved around computing 20 time-series attributes per drug-state-day combination, focusing on:
- Slope of query volume over the past 1 to 7 weeks.
- Query spike ratios comparing daily volumes to those of the past 7, 30, and 7/30 days.
- Analogous attributes specific to queries mentioning symptoms.
Prediction Model
To predict recalls, a bagging approach was employed, clustering the majority class (non-recalls) using k-means (k=500) and training linear predictors for each cluster against the minority class (recalls). The evaluation metrics included the ROC and lift, providing a balance between true and false positives and an understanding of classifier efficacy for rare events.
Results
The paper found that predictions of drug recalls could reach an AUC of 0.791 when forecasting one day ahead, with a significant lift of approximately 6 at the top 5% of the prediction model. Notably, the prediction effectiveness decreased with longer prediction horizons. The analysis revealed that sudden spikes in query volumes were particularly indicative of potential recalls. Specifically, recalls of medium-risk drugs (Class II) and prescription drugs were more readily identified.
Implications
Theoretical Implications
The findings underscore the potential utility of aggregated internet search data as an additional layer for pharmacovigilance. The ability to identify recall patterns based on search behavior introduces a novel predictive dimension that extends beyond the traditional adverse event reporting systems.
Practical Implications
Practically, this method facilitates a cost-effective, real-time surveillance mechanism that can potentially minimize the health risks associated with delayed drug recalls. Early detection not only aids in the prompt removal of defective drugs from the market but also mitigates financial losses experienced by manufacturers due to late interventions.
Future Directions
Future research should aim to extend this analysis over longer timeframes to verify stability and improve the accuracy of predictions. Additionally, exploring more granular attributes, such as interactions between search queries for drugs and specific adverse reactions, could enhance predictive power. Integrating public health authority systems with readily available data sources like Google Trends could simplify the deployment of such predictive models in routine drug safety monitoring.
In conclusion, this paper highlights the viability of using internet search query data as an early warning system for drug recalls, marking a significant step towards proactive pharmacovigilance.