Generative Medical Event Models

Updated 1 September 2025

Generative Medical Event Models are probabilistic and neural frameworks that simulate sequences of clinical events by explicitly modeling temporal and spatial dependencies.
They employ methods such as spatiotemporal Poisson processes, Markovian latent-variable models, and transformer-based architectures to forecast diagnoses, treatments, and outcomes.
These models advance disease mapping, prognostic forecasting, and synthetic data generation, ultimately enhancing clinical decision support and operational planning.

A generative medical event model is a probabilistic or neural framework designed to simulate or forecast sequences of medical events—such as diagnoses, treatments, laboratory results, or progressions of disease—by explicitly modeling the data-generating process underlying real-world clinical trajectories. Such models are increasingly foundational in medical data analysis, supporting tasks ranging from disease mapping and prognosis forecasting to synthetic data generation and clinical workflow simulation. Recent research spans approaches from spatiotemporal Poisson processes and Markovian latent-variable models to transformer-based foundation models leveraging unprecedentedly large-scale electronic health records (EHRs) and multimodal patient data.

1. Theoretical Foundations and Generative Frameworks

Generative medical event models are formulated to capture the complete data-generating process of clinical phenomena by specifying explicit probabilistic dependencies among observed and latent variables governing medical events over time and space. For spatial disease mapping, a coherent generative specification models the disease count $Y_i$ in region $i$ as a Poisson random variable conditioned on local incidence $p_i$ and population at risk $n_i$ :

$Y_i \mid p_i \sim \text{Poisson}(n_i p_i),$

with incidence probabilities linked to covariates and spatial random effects through generalized logistic regression:

$\text{logit}(p_i) = X_i^T \beta + \phi_i.$

This distinguishes such models from internally standardized (IS) models, which are incoherent because they use the observed data to compute both response counts and expected counts on opposite sides of the model equation (Wang et al., 2016).

In disease progression and treatment event settings, structured latent-variable generative models are employed. For example, a Markovian generative approach for EHR event sequences introduces two sets of latent variables: a treatment class $c$ , and a progression stage sequence $s = (s_1, ..., s_m)$ . The model generates observations $(a_1, \tau_1), ..., (a_m, \tau_m)$ (where $a_i$ is the medical action and $\tau_i$ the inter-event interval) according to priors over $c$ , Markov transitions $(a_i|a_{i-1}, s_{i-1}, c)$ , and explicit probability densities for irregular inter-event times such as geometric, exponential, or Weibull distributions (Zaballa et al., 2023).

Deep generative state-space models further abstract the patient state as a latent variable $z_t$ evolving according to learned non-linear dynamical systems (parameterized by neural networks), with observation and intervention models:

$z_t \sim p(z_t|z_{t-1}, u_t) = \mathcal{N}(\mathcal{A}_t(z_{t-1}) + \mathcal{B}_t(u_t), Q),$

$x_t \sim p(x_t|z_t), \quad u_t \sim p(u_t|z_{t-1}).$

Survival prediction is linked by time-dependent hazard functions $\lambda_t^e = \mathcal{L}^e(z_t)$ , allowing survival and event incidence functions to be defined over the evolving latent state (Xue et al., 2024).

2. Neural Autoregressive and Foundation Models for Medical Events

With increasing data scale and event-type diversity in EHRs, foundation models based on transformer architectures have been developed to autoregressively model and simulate complex clinical event sequences. In these architectures (e.g., CoMET (Waxler et al., 16 Aug 2025), PRISM (Levine et al., 4 Jun 2025), MedGPT (Kraljevic et al., 2021)), a patient's longitudinal history is tokenized into sequences representing granular events: diagnoses, labs, medications, procedures, visit headers, and time intervals.

A canonical decoder-only transformer processes such a sequence to predict the next event, leveraging self-attention over long event histories to model dependencies. In CoMET, input sequences are constructed using a specialized, hierarchical tokenization consistent with the ontological structure of each event type (e.g., splitting ICD-10-CM codes into category and subcategory tokens). The model is pretrained, following power-law scaling laws, on over 115 billion event tokens from 118 million patients, learning to condition the prediction of an event token $y_{i+1}$ on the preceding context $y_{\leq i}$ :

$P(y_{i+1}\mid y_{\leq i})$

Multi-label outputs are handled through joint prediction for next-visit events, with regularization schemes designed to mitigate repetition and inflate true onset prediction (Rajamohan et al., 1 Jul 2025).

These foundation models support simulation-based inference (autoregressive sampling of possible patient timelines), flexible zero-shot prediction, and generalization to unseen downstream tasks (diagnosis risk, prognosis, operational resource demands) without task-specific training (Waxler et al., 16 Aug 2025, Rajamohan et al., 1 Jul 2025, Redekop et al., 7 Mar 2025).

3. Clinical Text and Multimodal Generative Approaches

Recent generative medical event models integrate structured and unstructured data modalities to maximize predictive fidelity and clinical utility. MedGPT extends a GPT-2 architecture to narrative EHR data by extracting disorder concepts from clinical notes, using named entity recognition and linking (e.g., MedCAT) to produce concept sequences aligned to a standard ontology (SNOMED-CT) and learning to predict future disorders from temporally ordered concepts and age tokens (Kraljevic et al., 2021).

Generative Deep Patient (GDP) represents a modular, multimodal architecture that fuses structured EHR time-series (CNN-Transformer encoder) and unstructured clinical notes (BioClinicalBERT encoder) by cross-modal attention into a LLaMA-based decoder for clinical narrative generation and event prediction. Auxiliary tasks—Masked Feature Prediction (MFP) and Next Time-Step Prediction (NTP)—are included in generative pretraining, with flexible downstream multi-task adaptation (Sivarajkumar et al., 22 Aug 2025).

Vision-language generative models for medical imaging leverage text-visual embedding mechanisms, with tabular clinical attributes converted into textual prompts encoded by BERT and fused with visual features via affine transformations and cross-attention in GAN- or diffusion-based generators. This pipeline enables conditioning imaging generation on clinical context and reveals attribute-specific imaging patterns (e.g., increased lung intensity for "smoker" status) (Xing et al., 2024).

4. Event Extraction, Structure Awareness, and Synthetic Data Generation

Biomedical event extraction systems such as DICE (Ma et al., 2022) and GenBEE (Yuan et al., 2024) formulate clinical event detection and argument extraction as conditional generation problems over tokenized text sequences. Key advances include:

Contrastive learning objectives (e.g., InfoNCE loss) for refining mention boundary detection, using perturbed negatives to force precise boundary prediction in long and vague biomedical mentions.
Auxiliary mention identification tasks providing candidate triggers/arguments for insertion of special marker tokens, improving event/argument localization.
Structure-aware generative prompts, with external LLMs (e.g., GPT-4) used for constructing semantically and structurally rich event templates; structural prefix encoding injected as soft prompts into the attention mechanism to enable robust handling of nested and overlapping event structures.

Synthetic event time series data generation, such as with HealthGAN (a WGAN variant trained on transformed summary statistics per fixed interval), demonstrates the ability of generative models to create privacy-preserving synthetic health event matrices. Statistically, synthetic time series replicated key univariate trends, but nuanced subgroup behaviors and variance patterns required careful covariate stratification and highlighted limitations for high-variability populations (Dash et al., 2019).

5. Performance, Uncertainty, and Evaluation

Coherent generative model frameworks (e.g., coherent generative Poisson regression, deep state-space models) deliver point estimates of risk or event timing with improved or at least as good inferential properties compared to traditional approaches, often providing tighter credible intervals and more accurate quantification of uncertainty (Wang et al., 2016, Xue et al., 2024).

Evaluation metrics in neural generative event models typically include area under ROC curves (AUROC), average precision, cross-entropy loss, perplexity, and negative log-likelihood for next-event prediction. In medical event transformers, scaling the model size and pretraining compute leads to predictable improvements in pretraining loss and a monotonic increase in downstream task performance, as formalized by empirically derived power-law scaling laws for parameter and token efficiency (Waxler et al., 16 Aug 2025).

In event extraction, F1 scores for trigger and argument classification, both overall and under low-data regimes, provide quantification of model efficiency. For synthetic data, distributional alignment is evaluated via univariate plots, covariate probability calibration, and quantile analyses stratified by subgroups (Dash et al., 2019).

6. Translational and Clinical Impact

Large-scale generative medical event models serve as extensible engines for simulation-based clinical forecasting, decision support, and operational planning:

Zero-shot forecasting across a range of diagnoses and time horizons is enabled by foundation models pre-trained on broad event data, with demonstrated robustness across disease categories and strong performance in predicting incipient conditions (e.g., dementia, knee osteoarthritis, heart failure) (Rajamohan et al., 1 Jul 2025, Redekop et al., 7 Mar 2025, Waxler et al., 16 Aug 2025).
Simulation of longitudinal patient trajectories and probabilistic risk estimates (via Monte Carlo rollouts) support real-world evidence generation, operational forecasting (e.g., encounter counts, readmission), resource management, and early warning for adverse outcomes.
Multimodal and vision-language generative models offer new modes for clinical narrative authoring, documentation automation, structured-to-unstructured text synthesis, and precision visualization of phenomena such as tissue changes related to risk factors (e.g., smoking) (Sivarajkumar et al., 22 Aug 2025, Xing et al., 2024).
Generative event extraction models accelerate biomedical literature mining, pathway curation, and knowledge graph construction, especially for data-scarce domains and complex structural relationships (Yuan et al., 2024, Ma et al., 2022).

In summary, generative medical event models—spanning spatiotemporal probabilistic algorithms, Markovian and latent-variable models, neural autoregressive transformers, and multimodal vision-language systems—enable accurate simulation, inference, extraction, and forecasting of medical events in structured and unstructured clinical data. These frameworks are characterized by explicit handling of sequence dependencies, temporal and spatial structure, inherent uncertainty, and the integration of diverse data modalities, providing principled, scalable, and generalizable methods to support evidence-based precision medicine and healthcare operations.