Optimal scaling of model size and compute for generative medical event foundation models

Determine whether the choices of model size and training compute used in generative medical event foundation models are optimal, and rigorously characterize how model size and training token count should scale with available real-world medical event data to achieve compute-optimal training across patient timelines.

Background

Foundation models trained on longitudinal medical event sequences have been developed, but prior work has not systematically studied how model size and compute should be chosen relative to available data. This gap creates uncertainty about whether existing configurations are optimal and how to scale them appropriately for large datasets like Epic Cosmos.

The paper undertakes a large-scale scaling-law analysis to address this question using Cosmos data; the open question, as explicitly stated in the Introduction, motivates the paper’s methodology and results on compute-optimal training regimes.

References

Moreover, the choices of model size and compute have not been systematically studied and it's unclear whether they are optimal and how they should scale with available data.

— Generative Medical Event Models Improve with Scale (2508.12104 - Waxler et al., 16 Aug 2025) in Section 1: Introduction

Optimal scaling of model size and compute for generative medical event foundation models

Background

References

Related Problems