FDA OCMQs: Custom Queries for Drug Safety
- FDA OCMQs are structured grouping tools based on MedDRA Preferred Terms that facilitate adverse event signal detection in regulatory reviews.
- They employ advanced methods including AI-driven NLP, transformer embeddings, and fuzzy matching to automate safety data analysis.
- OCMQs integrate ontology-based semantic similarity and Bayesian borrowing for quantitative disproportionality analysis, optimizing precision and sensitivity.
The FDA Office of New Drugs Custom Medical Queries (OCMQs) are structured tools for grouping related adverse event (AE) concepts to facilitate safety signal detection during regulatory review and post-market surveillance. OCMQs build on MedDRA’s controlled vocabulary, typically at the Preferred Term (PT) level, and leverage advanced semantic and statistical methods to generate, validate, and apply customized AE groupings for quantitative analysis. Their construction and validation have increasingly incorporated automated, AI-based methods, including transformer-derived embedding models, similarity clustering, and ontology-based semantic similarity measures.
1. MedDRA Preferred Terms and the Foundations of OCMQ Construction
MedDRA’s five-level hierarchy (System Organ Class → High-Level Group Term → High-Level Term → Preferred Term → Lowest-Level Term) is internationally standard for granular coding of AEs (Vandenhende et al., 8 Dec 2025). PTs are consensus-level terms aggregating synonymous and closely-related LLTs, designed to support robust AE signal detection, consistent aggregation of safety data across sources (clinical trials, spontaneous reports, regulatory label text, EHR), and regulatory communication (Painter et al., 26 Mar 2025).
OCMQs utilize PTs to define AE clusters for querying safety datasets and identifying patterns suggestive of drug risks. The necessity for OCMQs arises because standard queries (SMQs) may not cover emerging, off-label, or mechanistically nuanced AE constellations.
2. Automated Semantic Mapping of Free Text to MedDRA PTs
Recent advances in NLP have enabled high-throughput, semi-automated mapping from free-text AE reports or drug labels to standardized PT codes. Systems such as PVLens extract labeled safety information from FDA Structured Product Labels (SPLs), employing a stepwise pipeline: XML parsing, substance linking via UMLS and RxNorm/SNOMED CT, text normalization, MedDRA dictionary matching (exact and fuzzy string match, normalized Levenshtein similarity ≥ 0.85), and confidence-weighted scoring (Painter et al., 26 Mar 2025).
Mappings combine exact string, fuzzy match, and context confidence into a composite score:
with , , and . Thresholding (e.g., ) supports expert review. PVLens methodology, validated against 97 SPLs, achieves F1 ≈ 0.899, recall ≈ 0.985, and precision ≈ 0.826 for AEs mapping (Painter et al., 26 Mar 2025). This suggests that high-throughput PT mapping pipelines dramatically reduce manual review effort and maintain regulatory-grade accuracy.
3. Embedding-Based Retrieval and Clustering for OCMQ Generation
SafeTerm AMQ builds transformer-based biomedical embeddings for both queries and PTs, projecting them to a shared high-dimensional medical map. Semantic proximity is scored via cosine similarity:
Bivariate similarity (to query and to best-match PT embedding) is clustered via k-means (k=2) to isolate high-relevance candidates (Vandenhende et al., 8 Dec 2025).
Term selection is threshold-tunable. At moderate cutoffs (e.g., 0.60), recall exceeds 80% but precision remains low (~11%). Increasing the threshold to 0.70–0.75 optimizes F1 (≈0.37–0.39) and raises precision over 30% (Vandenhende et al., 8 Dec 2025). For maximum specificity (precision ≈ 0.86), very high cutoffs (>0.85) are used at the cost of recall (< 0.2). Recommendations include starting at 0.60 for breadth, raising threshold for refined lists, and always anchoring queries to MedDRA PT nomenclature for optimal accuracy.
4. Ontology-Based Semantic Similarity Measures and Clustering for AE Grouping
Advanced OCMQ workflows can utilize semantic similarity measures (SSMs) computed over ontologies integrated in the UMLS Metathesaurus (MedDRA + SNOMED CT + MeSH). SSMs fall into path-based (e.g., WUPALMER, LCH) and intrinsic information content (IC)-based categories (e.g., INTRINSIC_LIN, INTRINSIC_LCH, SOKAL):
- WUPALMER:
- IC-based example (Lin):
IC-based SSMs consistently outperform path-based methods in clustering accuracy (F1 up to 0.404 for INTRINSIC_LCH vs F1 0.36 for WUPALMER), enabling aggregation of clinically related PTs for more sensitive signal detection and reduced manual review burden (Painter et al., 26 Mar 2025). A plausible implication is that IC-based SSMs should be favored in automated OCMQ generation and validation pipelines.
5. Performance Benchmarks and Validation against Standard OCMQ/SMQ Sets
Automated pipelines for OCMQ generation, as assessed against FDA OCMQ v3.0 and SMQ gold standards, display a tradeoff between recall and precision as similarity thresholds are adjusted. Table below summarizes SafeTerm performance (Vandenhende et al., 8 Dec 2025):
| Threshold | Precision | Recall | F1 |
|---|---|---|---|
| 0.50 | 0.03 | 0.95 | 0.05 |
| 0.60 | 0.11 | 0.84 | 0.18 |
| 0.70 | 0.34 | 0.57 | 0.37 |
| 0.75 | 0.49 | 0.42 | 0.39 |
| 0.90 | 0.86 | 0.09 | 0.14 |
For SMQs, mean F1 ≈ 0.36 at 0.70 threshold and ≈0.32 using automated knee-based thresholding; narrow-term PT subsets require slightly higher cutoffs for optimal clustering (Vandenhende et al., 8 Dec 2025). These empirical curves define operational choices for safety reviewers balancing sensitivity and specificity in OCMQ application.
6. Bayesian Semantic Borrowing and Quantitative Signal Detection
Continuous semantic similarity-based borrowing in quantitative disproportionality analysis (DPA) has superseded rigid HLGT-based grouping in some advanced OCMQ pipelines. Bayesian hierarchical models incorporate SSMs to weight information sharing across PTs:
MAP-style prior distributions are mixed with vague components, with overall mixture weight set data-adaptively: . Compared to HLGT borrowing, SSM-based DPA achieved higher sensitivity (+7%), detected signals on average 5 months earlier, and avoided dilution by unrelated PT contributions. This stability is pronounced in early post-marketing analysis, supporting regulatory safety decisions (Haguinet et al., 16 Apr 2025).
7. Practical Guidance and Implications for Pharmacovigilance
OCMQ design and deployment now depend on multi-method approaches:
- Begin automated retrieval at moderate similarity thresholds (0.55–0.60) to maximize recall; manually inspect top candidates for specificity.
- For high-precision needs or hypothesis-driven OCMQ construction, increase cutoff (0.70–0.80) and anchor queries to MedDRA PTs.
- Integrate IC-based SSM clustering to construct custom groupings aligned with clinical mechanisms, not merely MedDRA hierarchy.
- Employ Bayesian semantic borrowing for quantitative signal detection, especially in spontaneous reporting and early post-market surveillance.
- Maintain continuous PT and ontology updates, synchronize with MedDRA releases, and utilize expert-in-the-loop review, using web-based interfaces for efficiency (Painter et al., 26 Mar 2025).
This suggests that hybrid pipelines combining high-throughput NLP, embedding-based clustering, ontology SSMs, and Bayesian models can provide near-real-time, reproducible, and regulatory-aligned frameworks for OCMQ development and application, enhancing both the sensitivity and interpretability of drug safety monitoring.