Health System Learning in Healthcare
- Health system learning is a paradigm that trains AI models directly on raw, uncurated clinical data to continuously enhance patient care.
- It leverages advanced computational frameworks such as Vol-JEPA for neuroimaging and federated learning to ensure secure, scalable, and privacy-preserving model training.
- Practical insights include elevated diagnostic accuracy, optimized operational workflows, and robust integration of AI into clinical decision support systems.
Health system learning is a paradigm in which data-driven models and computational frameworks for healthcare are developed not from filtered research datasets, but by direct, continuous engagement with the uncurated data streams and workflows present within real-world health systems. The goal is to achieve a self-improving system in which models and processes adapt based on cumulative experience, leading to robust, scalable, and clinically actionable intelligence. This article reviews core definitions, methodologies, architectures, empirical results, and future challenges, with a technical focus on recent developments such as the NeuroVFM model for neuroimaging, federated and decentralized learning for privacy preservation, and the integration of health system learning with operational and clinical decision support systems.
1. Definition and Conceptual Foundations
Health system learning (HSL) is defined as the direct training of AI models using the raw, uncurated, and operational data generated in routine clinical care at large health systems, in contrast to models trained only on public datasets or curated research cohorts. This paradigm encompasses a wide spectrum of “learning” approaches, but targets large-scale, continuous ingestion of clinical data such as DICOM imaging archives, EHR records, laboratory data, and provider notes, including the full diversity of patient presentations, rare pathologies, protocol variations, and authentic artifacts. Supervision is often provided via self-supervised or weakly supervised methods, leveraging weak labels parsed from radiology reports or other clinical text (Kondepudi et al., 23 Nov 2025).
This stands in contrast to prevalent practices in which models are trained on filtered, web-scraped, or research-quality datasets, often lacking the variability and idiosyncrasies of real clinical practice. As a conceptual reference, the Institute of Medicine’s definition of a Learning Healthcare System (LHS) emphasizes the property that every patient’s care experience becomes an opportunity for learning, closing a feedback loop between care delivery (“data-to-knowledge”), updated practice (“knowledge-to-practice”), and continuous system improvement (Madduri et al., 29 Sep 2024).
2. Architectural and Algorithmic Frameworks
The technical architectures for health system learning typically involve multi-modal, multi-institutional data environments, supporting both centralized and distributed learning approaches.
Volumetric Joint-Embedding Predictive Architecture
A signature HSL implementation is the Vol-JEPA (Volumetric Joint-Embedding Predictive Architecture), used for 3D neuroimaging. The core objective is latent space prediction:
where is the set of visible 3D patches (context), the masked targets, a student encoder, a teacher encoder (EMA of ), a predictor, and learned tokens per target patch. This architecture avoids the need for decoders and heavy augmentations, training efficiently at scale (Kondepudi et al., 23 Nov 2025).
Federated and Decentralized Learning
To address privacy, data silos, and regulatory issues, federated learning (FL) and decentralized machine learning are widely used (Rehman et al., 2022, Kimovski et al., 2022). Federated optimization minimizes
where updates are computed locally and securely aggregated globally. For additional privacy, protocols such as secure aggregation, differential privacy (e.g., Gaussian/Laplace noise with privacy budgets), and homomorphic encryption (Paillier, CKKS) are integrated (Madduri et al., 29 Sep 2024). More advanced frameworks, such as confederated learning, enable secure model training even with vertically or identity-separated data modalities (Liu et al., 2019).
Feedback and Simulation Integration
HSL architectures often incorporate discrete-event simulation, reinforcement learning, or operational forecasting subsystems for process-level optimization and impact evaluation. This includes RL-driven hospital simulation models, continuous feedback loops for care management, and pipeline integration with both EHRs and hospital logistics (Mahyoub, 2022, Allen et al., 2020, Chung et al., 2022).
3. Data Assets, Preprocessing, and Self-Supervision
A central feature of health system learning is the scale and heterogeneity of clinical data used for model development. For instance, the UM-NeuroImages corpus comprises 566,915 studies and over 5.2 million 3D MRI/CT volumes spanning two decades at Michigan Medicine, allowing models to capture rare diseases, device artifacts, and real-world protocol variation (Kondepudi et al., 23 Nov 2025). Systematic preprocessing pipelines perform intensity normalization, patch tokenization, quantization (e.g., CT Hounsfield windowing), and weakly supervised label extraction from radiology reports.
Self-supervised learning is dominant: models learn from uncurated data by predicting masked contexts or generating structured outputs from raw volumes. Heavy augmentations are not required; intrinsic anatomical and pathologic variation is provided by the raw health system data itself.
4. Empirical Performance, Emergent Abilities, and Visual Grounding
HSL-trained foundation models have demonstrated state-of-the-art or expert-comparable performance across diverse clinical tasks. Key metrics from NeuroVFM (Kondepudi et al., 23 Nov 2025):
- Head CT (82 tasks): AUROC = 92.7 ± 0.4%
- Brain MRI (74 tasks): AUROC = 92.5 ± 0.4%
- Public benchmarks: e.g., RSNA-ICH hemorrhage detection F1 = 0.84, CQ500 ICH-related AUROC 0.93–0.97.
Emergent properties include:
- Robust anatomical representation: patch-level embeddings cluster by neuroanatomic region; zero-shot patch matching across subjects is possible.
- Cross-modal transfer: MRI-trained classifiers transfer without performance loss to CT equivalents (ΔAUROC ≤ ±0.05).
- Attention-driven grounding: attention-based classifier heads (AB-MIL) localize pathologies with high accuracy (78–85% “pointing game” accuracy), enabling interpretable predictions and visual rationale.
Scaling laws are observed: model size and data volume yield log-linear improvements with no early saturation, and fewer annotated examples are needed to achieve baseline diagnostic performance than internet-trained models.
5. Privacy, Security, and Decentralized Paradigms
HSL deployment must address strict privacy, data governance, and security requirements. Federated and decentralized learning paradigms (see table) are central to scalable, privacy-preserving learning.
| Paradigm | Key Mechanisms | Example Results |
|---|---|---|
| FL w/ blockchain | PoW blockchains log model updates; RTS-DELM IDS | Disease prediction 97% accuracy (Rehman et al., 2022) |
| Decentralized ML | Permissioned ledgers, DP, SMPC, edge/fog/cloud | ML training time reduced by up to 60% (Kimovski et al., 2022) |
| Confederated FL | cGAN-based cross-modal imputation, FedAvg | AUC-ROC close to centralized bounds (Liu et al., 2019) |
Differential privacy and secure aggregation are standard, with per-client (or per-subsystem) privacy budgets carefully tuned to minimize accuracy loss (<2–5% drop for ε ≈ 1–10) (Madduri et al., 29 Sep 2024). Decentralized architectures reduce single points of failure and support federated governance, allowing both large and small organizations to participate without data centralization (Kimovski et al., 2022).
6. Operational Integration, Continuous Learning, and Process Impact
Beyond model performance, system-level integration and feedback are essential for operational impact:
- Early warning systems (EWS) in multi-institution hospital settings leverage continuous data ingestion, automated preprocessing, gradient-boosted and random forest models, and closed-loop retraining. Gains include 25 percentage points improvement in AUC over standard protocols, robustness to clinical data heterogeneity, and generalizability across sites (Kobylarz et al., 2020).
- Process-aware learning for tasks like health referral management integrates ML forecasting with discrete event simulation. Self-improving cycles reduce average referral delays by 39–48% across hospitals (Mahyoub, 2022).
- Decision-aware ML for resource allocation embeds the downstream optimization objective into the ML training loss, dramatically reducing unmet demand in national medicine distribution systems by 81.9% on average (Chung et al., 2022).
- Dynamic, hierarchical Bayesian patient monitoring (e.g., for COVID-19) provides real-time, personalized prognosis with continuous model curation, rigorous calibration, and clinician-facing interpretability (Wang et al., 2021).
7. Standardization, Interoperability, and Future Challenges
Sustaining health system learning at scale requires robust standardization of care process specifications (CCPS) and interoperability frameworks:
- Standardizing CCPS (e.g., caremaps via TaSC) enables EHR and LHS integration, cost containment (2,700% per-caremap reduction), and accelerated development and comprehension (McLachlan, 2020).
- Interoperability (e.g., HL7 FHIR, DICOM, SNOMED/ICD-10) and openly documented APIs are prerequisites for reliable, system-spanning learning cycles, especially in resource-limited contexts (Mathur et al., 2020).
- Persistent challenges include privacy-utility trade-offs for multimodal FL, sample alignment in vertical FL, cost-aware orchestration, and formal federated governance (Madduri et al., 29 Sep 2024).
- There is a need for benchmarks, evaluation standards, and cross-site validation to drive generalizable, equitable advances.
In summary, health system learning enables foundation models and decision systems to “know the territory” rather than merely the “map.” By tightly integrating uncurated operational data, scalable self-supervision, privacy-preserving distributed optimization, and continuous feedback into informatics and care delivery, HSL yields interpretable, robust, and clinically actionable models that outperform internet-scale public baselines across diagnostic, operational, and resource allocation tasks, setting a blueprint for the next generation of generalist medical AI and self-improving health systems (Kondepudi et al., 23 Nov 2025).