Two-Sample Testing with Block-Wise Missingness in Multi-Source Data (2508.17411v1)
Abstract: Multi-source and multi-modal datasets are increasingly common in scientific research, yet they often exhibit block-wise missingness, where entire data sources or modalities are systematically absent for subsets of subjects. This structured form of missingness presents significant challenges for statistical analysis, particularly for two-sample hypothesis testing. Standard approaches such as imputation or complete-case analysis can introduce bias or result in substantial information loss, especially when the missingness mechanism is not random. To address this methodological gap, we propose the Block-Pattern Enhanced Test (BPET), a general framework for two-sample testing that directly accounts for block-wise missingness without requiring imputation or deletion of observations. As a concrete instantiation, we develop the Block-wise Rank In Similarity graph Edge-count (BRISE) test, which extends rank-based similarity graph methods to settings with block-wise missing data. Under mild conditions, we establish that the null distribution of BRISE converges to a chi-squared distribution. Simulation studies show that BRISE consistently controls the type I error rate and achieves good statistical power under a wide range of alternatives. Applications to two real-world datasets with block-wise missingness further demonstrate the practical utility of our method in identifying meaningful distributional differences.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.