FHIR Sequential Representation
- FHIR-based sequential representation is a method that maps EHR events into standardized FHIR resources and tokenizes them for sequential deep learning inputs.
- It employs mapping, tokenization, and temporal ordering to support scalable digital twin construction and real-time clinical decision support.
- The approach enhances interoperability and predictive accuracy, demonstrated by high AUROC scores across multiple clinical tasks in large-scale datasets.
A FHIR-based sequential representation is a data construction and processing paradigm in which electronic health record (EHR) events are mapped to Fast Healthcare Interoperability Resources (FHIR), tokenized, temporally ordered, and embedded as input sequences for deep learning or digital twin frameworks. This approach supports both structured (database-derived) and unstructured (free-text) clinical data, yielding highly generalizable, scalable, and interoperable representations suitable for predictive modeling, patient digital twin construction, and real-time clinical decision support (Rajkomar et al., 2018, Brens et al., 9 Jan 2026).
1. Mapping and Tokenization of EHR Data
The initial step involves mapping atomic clinical events—spanning encounter records, laboratory results, medication orders, flowsheet entries, diagnosis codes, and free-text notes—onto standardized FHIR resource types. Each atomic event (e.g., medication administration, lab measurement, narrative sentence) is converted to its corresponding FHIR resource (Patient, Encounter, Condition, Procedure, MedicationRequest, Observation, or Note). Within a FHIR resource, both structured attributes (e.g., code, value, dose) and unstructured fields (e.g., narrative text) are decomposed into discrete “tokens”:
- Categorical/textual fields: Drug names, ICD codes, and words from text notes become vocabulary tokens.
- Numerical fields: Values (lab results, vitals) are normalized (e.g., z-scored or clipped to physiological ranges), potentially discretized into bins, or used as real-valued inputs.
- Temporal information: Each token preserves the source event’s timestamp, with no site-specific harmonization beyond FHIR mapping (Rajkomar et al., 2018).
For unstructured EHRs, a pipeline of transformer-based Named Entity Recognition (NER), concept normalization (mapping to canonical codes in SNOMED-CT, ICD-10, RxNorm, or LOINC), and relation extraction is employed. Detected entities and their contextual/temporal relations are used to populate and assemble FHIR R4-compliant resources (Brens et al., 9 Jan 2026).
2. Assembly of Temporally Ordered FHIR Sequences
After tokenization and mapping, a patient’s longitudinal digital phenotype is represented as a temporal sequence of FHIR tokens:
- Events , each with a timestamp and resource-attribute type.
- Sequences are either grouped into regular time-steps or left as fully unrolled, variable-length event chains (typically, one step per token).
- For each tokenized event , a -dimensional embedding is derived—via embedding lookups for categorical/textual tokens and linear projection (with weights , ) for numerical values.
- Optionally, a learned or analytic temporal embedding (e.g., sinusoidal encoding or discrete bucket indexing) is added: (Rajkomar et al., 2018).
For digital twin pipelines, temporality is enforced by populating "effectiveDateTime," "onsetDateTime," or "authoredOn" fields, sorting the final FHIR resource bundle by these fields, and replaying this ordered collection to recover the patient’s evolving clinical state. Resources can be exported as a FHIR Bundle (JSON) (Brens et al., 9 Jan 2026).
3. Deep Learning and Predictive Modeling Architectures
The sequential FHIR representation acts as the direct input to several neural architectures:
- LSTM (Long Short-Term Memory): One- or two-layer recurrent neural network processes the ordered , maintaining hidden and cell states with standard gating equations, culminating in an output prediction via a fully connected head and sigmoid or softmax layer.
- Time-Aware Attention Neural Network (TANN): Applies attention-weighted pooling over hidden and input states:
The final context vector drives the downstream classifier (Rajkomar et al., 2018).
- Boosted Time-Stump Ensemble: Consists of simple, time-based decision stumps (e.g., if any lab exceeds a threshold at any time), learned through gradient boosting to capture localized temporal interactions.
The final prediction is produced by an ensemble (weighted averaging or stacking) of all three model types. All models in the ensemble share the same FHIR-sequence input without site-specific engineered features.
4. Evaluation Metrics, Benchmarking, and Attribution
The FHIR-sequential approach has been empirically validated across large-scale cohorts and multiple hospitals:
- Datasets: UCSF (85,522 train + 9,624 test admissions), Chicago (108,948 train + 12,127 test), with a total of 46,864,534,945 FHIR tokens across both sites (Rajkomar et al., 2018).
- Prediction tasks:
- In-hospital mortality (AUROC: 0.95 UCSF, 0.93 Chicago; vs. aEWS baselines 0.85/0.86)
- 30-day unplanned readmission (AUROC: 0.77/0.76; baselines 0.70/0.68)
- Long length of stay (≥7 days) (AUROC: 0.86/0.85; baselines 0.76/0.74)
- Discharge diagnosis assignment (weighted-AUROC: 0.90; baseline not stated)
This demonstrates substantial outperformance of deep learning over traditional handcrafted predictors without per-site harmonization or feature re-engineering.
For unstructured narrative-to-FHIR digital twin pipelines, NER achieved F1=0.89, relation extraction F1=0.81, semantic completeness 91%, and interoperability 0.88 against the MIMIC-IV-on-FHIR reference, significantly exceeding rule-based and naive baselines (Brens et al., 9 Jan 2026).
Attribution of predictions is realized via:
- For TANN, attention weights () indicate token contributions.
- For LSTM, gradient-based saliency or integrated gradients score each token (), which are color-mapped to original chart fields, establishing transparent model evidence overlays.
5. Workflow Generalizability, Extensibility, and Scalability
Workflow generalizability is an inherent property of the FHIR-based sequential approach:
- Every prediction task and data source (structured or unstructured, any FHIR-compliant EHR) uses the same input assembly pipeline, obviating hand-engineered features.
- Demonstrated on two geographically and technically distinct centers with nearly identical predictive performance.
- Pipeline scales to >200k tokens per patient; inference per patient is a matter of milliseconds given trained models (Rajkomar et al., 2018).
- Extension to new tasks (e.g., new outcomes or phenotyping) requires only model adaptation, not FHIR remapping or feature generation.
For NLP-driven digital twin workflows, ontology-based normalization and relation extraction allow rapid mapping of highly variable free-text notes to interpretable, temporally ordered digital twins with high schema completeness and terminology concordance, supporting downstream real-time or batch analytics (Brens et al., 9 Jan 2026).
6. Formal Mappings and Example Resource Assemblies
The pipeline formalizes the path from raw data or text to FHIR-sequence via explicit mathematical mappings:
- Concept normalization assigns standard coding tuples (system, code, display).
- Entity-to-resource mapping constructs type-specific FHIR resources (Condition, Observation, MedicationRequest).
- Relations augment resource fields (), for example, adding “has-dosage”.
- Resource set is bundled via , sorted chronologically via their effectiveDateTime or authoredOn fields (Brens et al., 9 Jan 2026).
Example FHIR JSON snippets (see table):
| Resource Type | Key Fields Populated | Example Code/Ontology |
|---|---|---|
| Condition | onsetDateTime, code, clinicalStatus | SNOMED-CT:44054006 |
| Observation | effectiveDateTime, code, valueQuantity | LOINC:85354-9 |
| MedicationRequest | authoredOn, medicationCodeableConcept, dosageInstruction | RxNorm:29046 |
Clients can replay a sorted FHIR Bundle to reconstruct the patient’s longitudinal state with millisecond-level resolution.
7. Interoperability and Impact on Clinical Informatics
A principal advantage is the mechanical, ontology-grounded, and lossless mapping from local EHR artifacts to the interoperable FHIR schema. No per-hospital harmonization is needed beyond standardization to FHIR, facilitating multi-center and multi-vendor deployments. The representation jointly supports both high-fidelity machine learning and digital twin construction.
Compared to variable curation pipelines, the FHIR-sequential paradigm yields significant improvements in:
- Schema completeness (91% field coverage vs. ~60% for rule-based methods)
- Terminological interoperability (0.88 match score to reference)
- Downstream predictive accuracy (substantial AUROC gains across clinical tasks)
- Interpretability (token-level attributions mapped to original chart context)
A plausible implication is that adoption of FHIR-based sequential representations will underpin robust, extensible, and explainable clinical AI systems and interoperable patient modeling, with minimal need for manual feature curation or custom integration engineering (Rajkomar et al., 2018, Brens et al., 9 Jan 2026).