Safeterm Trial-Safety App Overview

Updated 14 December 2025

The Safeterm Trial-Safety App is a clinical trial safety analytics platform that integrates transformer-based MedDRA encoding with automated signal detection and PRO instrument optimization.
It utilizes unsupervised semantic clustering, spectral analysis, and advanced visualizations to balance patient burden with comprehensive adverse event coverage.
The modular design ensures seamless integration with clinical workflows, enabling reproducible safety assessments and query generation for enhanced regulatory review.

The Safeterm Trial-Safety App is a web- and API-based platform for clinical trial safety analytics and patient-reported outcome (PRO) instrument optimization. Safeterm leverages a high-dimensional transformer-based embedding model to encode MedDRA Preferred Terms (PTs), integrating historical adverse event (AE) data, semantic mapping, clustering, utility-driven selection, and advanced visualizations to support automated signal detection, PRO-CTCAE design, and knowledge-based review. This approach streamlines patient burden–signal coverage trade-offs, enables unsupervised or reproducible MedDRA query generation, and enriches trial data interpretation for sponsors and regulatory professionals (Vandenhende et al., 7 Dec 2025, Vandenhende et al., 8 Dec 2025, Vandenhende et al., 8 Dec 2025, Vandenhende et al., 24 Nov 2025).

1. System Architecture and Data Flow

The Safeterm Trial-Safety App operates through modular backend and frontend components:

Frontend (Web Client): Users interact via a React/TypeScript interface, inputting historical AE profiles as MedDRA PT lists (with optional incidence counts). The app provides ranked PRO-CTCAE candidate tables, interactive plots (2D projections, leverage vs. rank), and CSV/Excel export (Vandenhende et al., 7 Dec 2025).
Backend API: Implemented using Python (FastAPI/Flask), it exposes RESTful endpoints (e.g., /select_pro for PRO-CTCAE selection, AMQ endpoints for MedDRA queries) that orchestrate mapping, embedding, scoring, clustering, and spectral selection pipelines.
Data Stores: SQL/NoSQL databases hold MedDRA dictionaries, mapping tables linking PRO-CTCAE items to PTs, and the Safeterm embedding model (PyTorch, d=300).
Outputs: Structured JSON returns candidate term rankings (relevance, utility, diversity, leverage), recommended cut-offs (k_opt), and scores, with direct export and browser-based visualization capabilities.

This architecture supports seamless integration with EDC/pharmacovigilance workflows and enables interactive data-driven refinement for safety monitoring, PRO selection, and query generation.

2. MedDRA Mapping and Semantic Embedding

PRO-CTCAE to MedDRA Mapping: Each PRO-CTCAE symptom (≈124 plain-language items) is manually mapped by expert terminologists to one or two MedDRA PTs, resolving lexical ambiguity via LLTs; this preserves the original PRO intent while providing semantic linkage (Vandenhende et al., 7 Dec 2025).

Safeterm Embedding Model: All MedDRA PTs are encoded in a transformer-based model, trained on large biomedical corpora and MedDRA hierarchy, yielding normalized vectors $\mathbf{e}_{PT}\in \mathbb{R}^{300}$ . This embedding space forms the basis for all semantic computations (cosine similarity, clustering, diversity scoring).

Semantic Similarity: For two normalized vectors $x$ , $y$ , similarity is $cosine(x,y) = x \cdot y$ ; broader relationships (clinical, mechanistic, linguistic) are captured beyond strict MedDRA hierarchy (Vandenhende et al., 24 Nov 2025).

3. Relevance, Utility, and Diversity Ranking

Relevance Scoring:

Redundancy among PRO items: $S = E_{PRO}\cdot E_{PRO}^T$ , $S_{i,j} = cosine(e_i,e_j) \in [0,1]$ .
Relevance to AE history: $Q = E_{trial}\cdot E_{PRO}^T$ , $Q_{i,j} = cosine(e_{trial_i},e_{PRO_j})$ .
Raw relevance: $R_j = \max_i Q_{i,j}$ .
Incidence weighting: $W_j = \sum_{i: Q_{i,j} > \alpha\cdot \max_i Q_{i,j}} w_i$ , with $\alpha=0.9$ .

Utility Function:

Saturated relevance: $R^*_j = 1/(1+e^{-k(R_j-x_0)})$ ( $k=20$ , $x_0=0.8$ ).
Combined utility: $U_j = R^*_j + \beta \cdot (W_j/\max_j W_j)$ , $\beta=0.1$ .

L-kernel for Utility/Diversity: From Determinantal Point Process theory, $L_{i,j} = U_i \cdot S_{i,j} \cdot U_j$ , $L = \text{diag}(U)S\text{diag}(U)$ . Diagonal entries capture utility; off-diagonals encode semantic overlap penalties.

4. Spectral Analysis and Orthogonal Symptom Selection

Eigen-Decomposition and Explained Variance:

$L = V\Lambda V^T$ , where $\Lambda = \text{diag}(\lambda_1,...,\lambda_{N_{\text{PRO}}})$ , $V$ orthonormal eigenvectors.
Cumulative explained variance: $CV(j) = (\sum_{i=1}^j \lambda_i)/(\sum_{i=1}^{N_{\text{PRO}}} \lambda_i)$ .
Minimal orthogonal set size: $k_{opt} = \min\{j | CV(j) \geq \text{info\_threshold}\}$ ($0.90$–$0.975$ typical).

Diversity Leverage Score:

For item $j$ : $\text{Leverage}_j = \sum_{i=1}^{k_{opt}} (V_{j,i})^2$ .
Items are rank-ordered by leverage, enforcing selection of the top $k_{opt}$ for coverage across all axes.

5. Automated MedDRA Query Generation and Validation

Safeterm incorporates AMQ (Automated Medical Query) features for MedDRA term retrieval (Vandenhende et al., 8 Dec 2025, Vandenhende et al., 8 Dec 2025):

Workflow: Free-text query or MedDRA PT input $\to$ embedding $\to$ cosine similarity computation $\to$ extreme-value (two-means) clustering $\to$ knee-point threshold selection $\to$ ranked PT candidate list.
Thresholding: Lower thresholds (e.g., 0.50–0.60) maximize recall (≈0.94 for SMQs, ≈0.95 for OCMQs); higher thresholds (0.70–0.90) increase precision (up to 0.89 for SMQs, 0.86 for OCMQs), sacrificing recall.
Performance: For the optimal F1 threshold ( $\sim$ 0.70): SMQ recall 0.48/precision 0.45/F1 0.44; OCMQ recall 0.57/precision 0.34/F1 0.37.
Narrow-term PTs: Require slightly higher similarity thresholds, maintain recall, slightly reduced precision by gold set size.
Recommendations: Use valid MedDRA PTs as queries, adjust thresholds to match sensitivity/specificity needs, integrate with EDC systems for real-time query generation and review.

6. Visualization, Knowledge Layer, and Clustering

Hidden Medical Knowledge Layer: Safeterm augments MedDRA PTs with high-dimensional embeddings, semantic descriptors, and precomputed pairwise cosine similarities, forming a latent relationship graph (Vandenhende et al., 24 Nov 2025).

Automatic Clustering:

Trial-observed PT embeddings are reduced (PCA) and clustered via agglomerative or k-means algorithms.
Cluster identity is decoded via AI translators from embedding centroids; ungrouped PTs (low silhouette scores) are flagged and colored distinctly.

Shrinkage Incidence Ratio (SIR) and Cluster-Level EBGM:

Expected count: $E_{ij} = N_i \frac{n_{\cdot j}}{N_{\cdot}}$ ; SIR $SIR_{ij} = \frac{n_{ij} + \alpha}{E_{ij} + \beta}$ (gamma-Poisson shrinkage).
Cluster-level aggregation: Precision-weighted mean $\mathrm{EBGM}_{i,C} = \frac{\sum_{j\in C} w_j SIR_{ij}}{\sum_{j\in C} w_j}$ , $w_j=\frac{n_{ij}+\alpha}{SIR_{ij}^2}$ .

Visualization Outputs:

Semantic Map: 2D PCA/t-SNE projection of PTs, colored by semantic cluster, sized by incidence rate; interactive filtering and tooltip details.
Expectedness-versus-Disproportionality Plot (EVD): X-axis: expectedness (cosine similarity to disease indication vector), Y-axis: $SIR_{ij}$ . Points colored by cluster, sized by incidence. Outliers (low expectedness, high $SIR$ ) denote novel safety signals.

7. Empirical Results and Practical Integration

Monte Carlo Simulations (N=100,000): Mean recall 0.70, precision 0.72, F1 0.70 (info threshold 97.5%), stable across signal/noise levels (Vandenhende et al., 7 Dec 2025).

Oncology Case Study (Multiple Myeloma):

Phase I: Algorithm selected $k_{opt}=16$ PRO-CTCAE items; all matched AE PTs; 9 exact-matches flagged and excluded for redundancy.
Phase II: Automated list overlapped with 8 of 15 manual PROs; coverage was comparable (auto 11/16, manual 11/15 retrieved).
Automated selection provided objective, reproducible design and explicit burden–coverage justification.

Legacy Trials with Semantic Clustering:

Duchenne Muscular Dystrophy: Liver damage cluster detected (semantic map, cluster-level EBGM); minor hepatotoxicity signals enriched.
Narcolepsy Dose-Response: Dose-dependent stress cluster SIR rise detected.
Hodgkin’s Lymphoma: Bone marrow failure cluster differentiated between treatments.

Practical Recommendations:

Start broad signal detection at moderate thresholds, refine for specificity as needed.
Leverage semantic clustering and visualization for hypothesis generation and transparent safety review.
Integrate app endpoints with clinical EDC, pharmacovigilance, and dashboard systems.

Safeterm transforms trial safety workflows by embedding MedDRA PTs in a semantically calibrated hidden space, enabling objective, reproducible PRO selection, rapid and unsupervised term query generation, and advanced clustering-based signal analysis, validated across diverse oncology and neurology trials (Vandenhende et al., 7 Dec 2025, Vandenhende et al., 8 Dec 2025, Vandenhende et al., 8 Dec 2025, Vandenhende et al., 24 Nov 2025).