Predicting drug recalls from Internet search engine queries (1611.08848v1)

Published 27 Nov 2016 in cs.IR and stat.AP

Abstract: Batches of pharmaceutical are sometimes recalled from the market when a safety issue or a defect is detected in specific production runs of a drug. Such problems are usually detected when patients or healthcare providers report abnormalities to medical authorities. Here we test the hypothesis that defective production lots can be detected earlier by monitoring queries to Internet search engines. We extracted queries from the USA to the Bing search engine which mentioned one of 5,195 pharmaceutical drugs during 2015 and all recall notifications issued by the Food and Drug Administration (FDA) during that year. By using attributes that quantify the change in query volume at the state level, we attempted to predict if a recall of a specific drug will be ordered by FDA in a time horizon ranging from one to 40 days in future. Our results show that future drug recalls can indeed be identified with an AUC of 0.791 and a lift at 5% of approximately 6 when predicting a recall will occur one day ahead. This performance degrades as prediction is made for longer periods ahead. The most indicative attributes for prediction are sudden spikes in query volume about a specific medicine in each state. Recalls of prescription drugs and those estimated to be of medium-risk are more likely to be identified using search query data. These findings suggest that aggregated Internet search engine data can be used to facilitate in early warning of faulty batches of medicines.

Citations (18)

View on Semantic Scholar

Summary

The paper demonstrates that sudden spikes in search query volumes can predict FDA drug recalls with an AUC of up to 0.791.
It employs detailed time-series feature engineering and a bagging approach with k-means clustering to identify early warning signals.
The findings support using internet search data as a real-time, cost-effective supplement to traditional pharmacovigilance systems.

Predicting Drug Recalls from Internet Search Engine Queries

The paper "Predicting drug recalls from Internet search engine queries" by Elad Yom-Tov investigates the potential of leveraging search data for the early detection of defective pharmaceutical batches. This paper aims to explore whether search query trends can serve as precursors to FDA drug recall events, thereby enhancing early warning systems for the detection of faulty drugs.

The primary material for this research consists of search queries from the Bing search engine across the USA in 2015. These were compared with recall notifications issued by the FDA within the same year. The dataset included 5,195 pharmaceutical drugs and their mentions in search queries, focusing on attributes such as the volume and change rate of queries at the state level to predict impending recalls.

Methodology

Data Extraction and Filtering

The data was meticulously extracted from Bing's 2015 search logs, capturing anonymized user identifiers, query text, date, and the US state of origin. Queries mentioning any of 5,195 drugs were filtered, narrowing down to 373 drugs with at least 1,000 mentions. The paper used the FDA's Recall Enterprise System (RES) as ground truth, categorizing recalls by state, date, and drug, along with classification into Class I, II, or III recalls.

Feature Engineering

The core methodology revolved around computing 20 time-series attributes per drug-state-day combination, focusing on:

Slope of query volume over the past 1 to 7 weeks.
Query spike ratios comparing daily volumes to those of the past 7, 30, and 7/30 days.
Analogous attributes specific to queries mentioning symptoms.

Prediction Model

To predict recalls, a bagging approach was employed, clustering the majority class (non-recalls) using k-means (k=500) and training linear predictors for each cluster against the minority class (recalls). The evaluation metrics included the ROC and lift, providing a balance between true and false positives and an understanding of classifier efficacy for rare events.

Results

The paper found that predictions of drug recalls could reach an AUC of 0.791 when forecasting one day ahead, with a significant lift of approximately 6 at the top 5% of the prediction model. Notably, the prediction effectiveness decreased with longer prediction horizons. The analysis revealed that sudden spikes in query volumes were particularly indicative of potential recalls. Specifically, recalls of medium-risk drugs (Class II) and prescription drugs were more readily identified.

Implications

Theoretical Implications

The findings underscore the potential utility of aggregated internet search data as an additional layer for pharmacovigilance. The ability to identify recall patterns based on search behavior introduces a novel predictive dimension that extends beyond the traditional adverse event reporting systems.

Practical Implications

Practically, this method facilitates a cost-effective, real-time surveillance mechanism that can potentially minimize the health risks associated with delayed drug recalls. Early detection not only aids in the prompt removal of defective drugs from the market but also mitigates financial losses experienced by manufacturers due to late interventions.

Future Directions

Future research should aim to extend this analysis over longer timeframes to verify stability and improve the accuracy of predictions. Additionally, exploring more granular attributes, such as interactions between search queries for drugs and specific adverse reactions, could enhance predictive power. Integrating public health authority systems with readily available data sources like Google Trends could simplify the deployment of such predictive models in routine drug safety monitoring.

In conclusion, this paper highlights the viability of using internet search query data as an early warning system for drug recalls, marking a significant step towards proactive pharmacovigilance.

PDF Markdown