Papers
Topics
Authors
Recent
Search
2000 character limit reached

MIMICEL: ED Process Mining Event Log

Updated 23 January 2026
  • MIMICEL is a curated dataset that captures detailed ED patient flows with timestamped activities and rich clinical and demographic attributes.
  • It employs a formal process mining framework with rigorous preprocessing, activity mapping, and data cleaning to ensure accurate event sequencing.
  • The dataset supports analyses of ED throughput, crowding dynamics, and process variants, and is available in both CSV and XES formats.

The MIMICEL Event Log is a curated dataset that encodes the detailed process of patient flow in the emergency department (ED) as extracted from the publicly available MIMIC-IV-ED database. It provides a rigorously structured event log—each case is an ED stay, segmented into timestamped activities—designed to support process mining analyses of ED operations. MIMICEL encompasses a comprehensive annotation of both case-level and event-level clinical, demographic, and operational data, enabling granular exploration of throughput, crowding, and clinical process variants within the ED setting (Wei et al., 26 May 2025).

1. Formal Event-Log Structure

MIMICEL adopts a formal schema rooted in process mining conventions. The log is a multiset LL of traces,

L={σccC},L = \{ \sigma_c \mid c \in C \},

where σc\sigma_c corresponds to a totally ordered sequence of events for ED stay cc:

σc=ec,1, ec,2, , ec,nc,\sigma_c = \langle e_{c,1},\ e_{c,2},\ \ldots,\ e_{c,n_c} \rangle,

with each event

ec,j=(c, ac,j, tc,j, αc,j),e_{c,j} = (c,\ a_{c,j},\ t_{c,j},\ \alpha_{c,j}),

consisting of case identifier cc, activity label ac,ja_{c,j} from a finite set AA, timestamp tc,jt_{c,j}, and a vector of event-level attributes αc,j\alpha_{c,j}.

The six activity labels tracked are:

  • Enter the ED
  • Triage in the ED
  • Vital sign check
  • Medicine reconciliation
  • Medicine dispensation
  • Discharge from the ED

Each trace σc\sigma_c carries case-level attributes βc\beta_c sourced from ED admission tables.

2. Data Extraction and Transformation Pipeline

The curation pipeline follows a modified nine-step guideline by Jans et al., realized through four primary SQL scripts, each corresponding to a major stage of event-log assembly:

  • Preprocessing: Ensures timestamp integrity; removes stays where the admission time is on or after discharge.
  • Activity Identification: Maps major ED activities to their respective source tables using relational keys (the cornerstone ‘throughput’ activities per Asplin) and constructs an ER-diagram to link every table via stay_id.
  • Event Formation: For each activity, relevant rows are appended to a unified table, assigning labels, timestamps, and associated measurements.
  • Cleaning and Filtration: Events are filtered to only those occurring strictly within stay intervals, with stays missing any mandatory activity (enter, triage, discharge) dropped.

Conversion to process mining-compatible formats (XES) is performed using PM4Py’s csv2xes.py utility. The final dataset is made available in both CSV and XES formats to facilitate downstream process mining tool compatibility.

3. Attribute Specification

MIMICEL is characterized by rich attribute annotation at both the case and event levels.

Case-level attributes (βc\beta_c):

  • stay_id: integer, unique ED stay identifier
  • subject_id: integer, unique patient identifier
  • gender: categorical (M/F/Other)
  • race: categorical
  • arrival_transport: categorical (e.g., AMBULANCE, WALK_IN, HELICOPTER)
  • disposition: categorical (e.g., HOME, ADMITTED, TRANSFERRED)
  • acuity: integer, ESI triage level 1–5
  • chiefcomplaint: string (de-identified free text)

Event-level attributes (αc,j\alpha_{c,j}):

  • activity: string, activity label
  • timestamp: datetime, ≥second granularity
  • hadm_id: integer (hospital admission ID, null for ED home discharges)
  • temperature: float (°F/°C)
  • heartrate, resprate, o2sat, sbp, dbp: integer (vital measurements)
  • pain: integer (0–10 self-report)
  • rhythm: string (cardiac rhythm)
  • med_rn: integer (dispensed meds count)
  • seq_num: integer (number of diagnoses)
  • name: string (medication name)
  • gsn, ndc: string (drug codes)
  • etc_rn, etccode, etcdescription: enhanced therapeutic class fields
  • gsn_rn: integer
  • icd_code, icd_version, icd_title: diagnosis fields

4. Trace Segmentation and Event Ordering

Each trace σc\sigma_c is defined as the sequence of events sharing the same stay_id, ordered strictly by timestamp. When multiple events occur at the identical timestamp (e.g., simultaneous medication dispensing), ties are preserved with stable insertion order. To maintain logical event sequence, “Triage in the ED” is set at intime+1intime + 1 second to ensure proper causal follow-through from “Enter the ED.” Events outside the admitted interval (intime, outtime)(intime,\ outtime) are excluded, yielding traces restricted to in-ED activities.

5. Aggregate Statistics and Process Metrics

MIMICEL comprises:

  • C=425,028|C| = 425,028 cases (ED stays)
  • 205,466 distinct subject_id entries (patients)
  • E=7,568,824|E| = 7,568,824 total events
  • A=6|A| = 6 activity types

Average trace length is nˉ=(1/C)cCnc18\bar{n} = (1/|C|)\sum_{c\in C} n_c \approx 18, with per-case event count spanning from $3$ to $218$.

Formally:

  • Tstart(c)=tc,1T_{start}(c) = t_{c,1} ("Enter the ED")
  • Tend(c)=tc,ncT_{end}(c) = t_{c,n_c} ("Discharge from the ED")
  • Throughput time: τ(c)=Tend(c)Tstart(c)\tau(c) = T_{end}(c) - T_{start}(c), with mean μτ=(1/C)cCτ(c)\mu_\tau = (1/|C|)\sum_{c \in C}\tau(c).

Activity-sojourn times (a→b) are defined as the median time interval from the completion of activity aa to the next occurrence of bb on the same trace. For instance, in the acuity=3 cohort, 2.65% of traces transition directly from “Triage” to “Discharge” with a median interval of approximately 2.1 hours.

6. Process Mining Analyses and Variants

Using tools such as Disco and PM4Py, various canonical process flows and variants have been elucidated:

Acuity-3 cohort (~50%):

  • Dominant pathway: Enter → Triage → Vital → Medicine reconciliation → Vital → Discharge
  • Direct Triage→Discharge in 2.65% of cases (median 2.1 h)
  • Consecutive Vital→Vital loops in 42.6% (median interval 99 min)

Activity coverage by acuity:

Activity Acuity 1 (%) Acuity 5 (%)
Medicine dispensation 81.5 27.6
Vital-check loops 73.2 10.0

Length-of-Stay (LoS) quadrant analysis:

Normal LoS \leq 500 min; Prolonged LoS >> 500 min. Q4 quadrants (high-acuity, prolonged LoS) exhibit:

  • 88% with Vital→Vital loops (median duration 2× Q1’s 58%)
  • Medicine↔Vital transitions in 80–83% vs 50–57% in Q1; duration roughly doubled

Crowding analysis:

Crowding for stay cc: X(c)={dC  ¬(intimed>outtimec  outtimed<intimec)}X(c) = |\{ d\in C \ |\ \neg (intime_d > outtime_c\ \lor\ outtime_d < intime_c)\}|

  • 75th percentile threshold set at 12 simultaneous patients (“crowded” label)
  • In crowded periods, longer intervals between vital-checks and slower trajectories to discharge, especially for admitted patients (median Vital→Discharge 58 min compared to 22 min for home-discharge).

A plausible implication is that crowding—and clinical acuity—are directly associated with process bottlenecks and increased throughput times.

7. Availability, Utility, and Prospective Applications

MIMICEL is a complete, end-to-end log of ED stays, annotated with clinical measurements, patient demographics, interventions, and disposition. The dataset is curated for compatibility with automated process mining, conformance checks, and performance analysis tools. Public access is provided for both data and code:

These resources position MIMICEL as an instrumentation-ready dataset for research into ED efficiency, bottleneck characterisation, crowding dynamics, pathway discovery, and the validation of process mining methodologies within healthcare informatics (Wei et al., 26 May 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to MIMICEL Event Log.