MIMICEL: ED Process Mining Event Log
- MIMICEL is a curated dataset that captures detailed ED patient flows with timestamped activities and rich clinical and demographic attributes.
- It employs a formal process mining framework with rigorous preprocessing, activity mapping, and data cleaning to ensure accurate event sequencing.
- The dataset supports analyses of ED throughput, crowding dynamics, and process variants, and is available in both CSV and XES formats.
The MIMICEL Event Log is a curated dataset that encodes the detailed process of patient flow in the emergency department (ED) as extracted from the publicly available MIMIC-IV-ED database. It provides a rigorously structured event log—each case is an ED stay, segmented into timestamped activities—designed to support process mining analyses of ED operations. MIMICEL encompasses a comprehensive annotation of both case-level and event-level clinical, demographic, and operational data, enabling granular exploration of throughput, crowding, and clinical process variants within the ED setting (Wei et al., 26 May 2025).
1. Formal Event-Log Structure
MIMICEL adopts a formal schema rooted in process mining conventions. The log is a multiset of traces,
where corresponds to a totally ordered sequence of events for ED stay :
with each event
consisting of case identifier , activity label from a finite set , timestamp , and a vector of event-level attributes .
The six activity labels tracked are:
- Enter the ED
- Triage in the ED
- Vital sign check
- Medicine reconciliation
- Medicine dispensation
- Discharge from the ED
Each trace carries case-level attributes sourced from ED admission tables.
2. Data Extraction and Transformation Pipeline
The curation pipeline follows a modified nine-step guideline by Jans et al., realized through four primary SQL scripts, each corresponding to a major stage of event-log assembly:
- Preprocessing: Ensures timestamp integrity; removes stays where the admission time is on or after discharge.
- Activity Identification: Maps major ED activities to their respective source tables using relational keys (the cornerstone ‘throughput’ activities per Asplin) and constructs an ER-diagram to link every table via stay_id.
- Event Formation: For each activity, relevant rows are appended to a unified table, assigning labels, timestamps, and associated measurements.
- Cleaning and Filtration: Events are filtered to only those occurring strictly within stay intervals, with stays missing any mandatory activity (enter, triage, discharge) dropped.
Conversion to process mining-compatible formats (XES) is performed using PM4Py’s csv2xes.py utility. The final dataset is made available in both CSV and XES formats to facilitate downstream process mining tool compatibility.
3. Attribute Specification
MIMICEL is characterized by rich attribute annotation at both the case and event levels.
Case-level attributes ():
- stay_id: integer, unique ED stay identifier
- subject_id: integer, unique patient identifier
- gender: categorical (M/F/Other)
- race: categorical
- arrival_transport: categorical (e.g., AMBULANCE, WALK_IN, HELICOPTER)
- disposition: categorical (e.g., HOME, ADMITTED, TRANSFERRED)
- acuity: integer, ESI triage level 1–5
- chiefcomplaint: string (de-identified free text)
Event-level attributes ():
- activity: string, activity label
- timestamp: datetime, ≥second granularity
- hadm_id: integer (hospital admission ID, null for ED home discharges)
- temperature: float (°F/°C)
- heartrate, resprate, o2sat, sbp, dbp: integer (vital measurements)
- pain: integer (0–10 self-report)
- rhythm: string (cardiac rhythm)
- med_rn: integer (dispensed meds count)
- seq_num: integer (number of diagnoses)
- name: string (medication name)
- gsn, ndc: string (drug codes)
- etc_rn, etccode, etcdescription: enhanced therapeutic class fields
- gsn_rn: integer
- icd_code, icd_version, icd_title: diagnosis fields
4. Trace Segmentation and Event Ordering
Each trace is defined as the sequence of events sharing the same stay_id, ordered strictly by timestamp. When multiple events occur at the identical timestamp (e.g., simultaneous medication dispensing), ties are preserved with stable insertion order. To maintain logical event sequence, “Triage in the ED” is set at second to ensure proper causal follow-through from “Enter the ED.” Events outside the admitted interval are excluded, yielding traces restricted to in-ED activities.
5. Aggregate Statistics and Process Metrics
MIMICEL comprises:
- cases (ED stays)
- 205,466 distinct subject_id entries (patients)
- total events
- activity types
Average trace length is , with per-case event count spanning from $3$ to $218$.
Formally:
- ("Enter the ED")
- ("Discharge from the ED")
- Throughput time: , with mean .
Activity-sojourn times (a→b) are defined as the median time interval from the completion of activity to the next occurrence of on the same trace. For instance, in the acuity=3 cohort, 2.65% of traces transition directly from “Triage” to “Discharge” with a median interval of approximately 2.1 hours.
6. Process Mining Analyses and Variants
Using tools such as Disco and PM4Py, various canonical process flows and variants have been elucidated:
Acuity-3 cohort (~50%):
- Dominant pathway: Enter → Triage → Vital → Medicine reconciliation → Vital → Discharge
- Direct Triage→Discharge in 2.65% of cases (median 2.1 h)
- Consecutive Vital→Vital loops in 42.6% (median interval 99 min)
Activity coverage by acuity:
| Activity | Acuity 1 (%) | Acuity 5 (%) |
|---|---|---|
| Medicine dispensation | 81.5 | 27.6 |
| Vital-check loops | 73.2 | 10.0 |
Length-of-Stay (LoS) quadrant analysis:
Normal LoS 500 min; Prolonged LoS 500 min. Q4 quadrants (high-acuity, prolonged LoS) exhibit:
- 88% with Vital→Vital loops (median duration 2× Q1’s 58%)
- Medicine↔Vital transitions in 80–83% vs 50–57% in Q1; duration roughly doubled
Crowding analysis:
Crowding for stay :
- 75th percentile threshold set at 12 simultaneous patients (“crowded” label)
- In crowded periods, longer intervals between vital-checks and slower trajectories to discharge, especially for admitted patients (median Vital→Discharge 58 min compared to 22 min for home-discharge).
A plausible implication is that crowding—and clinical acuity—are directly associated with process bottlenecks and increased throughput times.
7. Availability, Utility, and Prospective Applications
MIMICEL is a complete, end-to-end log of ED stays, annotated with clinical measurements, patient demographics, interventions, and disposition. The dataset is curated for compatibility with automated process mining, conformance checks, and performance analysis tools. Public access is provided for both data and code:
- CSV/XES: https://physionet.org/content/mimicel-ed/2.1.0/
- Extraction scripts: https://github.com/ZhipengHe/MIMIC-IV-event-log-extraction-for-ED
These resources position MIMICEL as an instrumentation-ready dataset for research into ED efficiency, bottleneck characterisation, crowding dynamics, pathway discovery, and the validation of process mining methodologies within healthcare informatics (Wei et al., 26 May 2025).