Dice Question Streamline Icon: https://streamlinehq.com

Simultaneous imputation for multi-block structured and sporadic missingness in integrated EHR data

Develop an efficient matrix imputation method that simultaneously handles multiple disjoint structured missing blocks together with heterogeneous sporadic missing entries in high-dimensional, approximately low-rank matrices arising from integrated Electronic Health Records datasets, avoiding the loss of information that occurs when merging missing blocks and improving over existing multi-block-only approaches that ignore sporadic missingness.

Information Square Streamline Icon: https://streamlinehq.com

Background

Integrated analyses of Electronic Health Records often present both structured missingness—arising from systematic gaps across multiple data sources—and sporadic missingness—arising from random, heterogeneous entry-level omissions. While Macomss effectively addresses a single structured missing block together with sporadic missingness, real-world integrations may produce multiple missing blocks (multi-block) across datasets.

Existing methods for multi-block missingness, such as approaches that ignore sporadic missingness, fail to accommodate the heterogeneous entry-level gaps typical in EHR data. A pragmatic workaround is to merge multiple missing blocks into a single large block and apply single-block methods; however, this can treat actually observed entries within the merged region as missing, causing loss of information. The authors explicitly identify the need for efficient methods that address multi-block structured missingness and sporadic missingness simultaneously.

References

Efficiently imputing multi-block and sporadic missingness simultaneously remains an open problem that can contribute to more general and applicable integrated analysis.

Integrated Analysis for Electronic Health Records with Structured and Sporadic Missingness (2506.09208 - Tan et al., 10 Jun 2025) in Discussion (Section 5)