Papers
Topics
Authors
Recent
2000 character limit reached

Improving Survey Inference in Two-phase Designs Using Bayesian Machine Learning

Published 7 Jun 2023 in stat.ME | (2306.04119v1)

Abstract: The two-phase sampling design is a cost-effective sampling strategy that has been widely used in public health research. The conventional approach in this design is to create subsample specific weights that adjust for probability of selection and response in the second phase. However, these weights can be highly variable which in turn results in unstable weighted analyses. Alternatively, we can use the rich data collected in the first phase of the study to improve the survey inference of the second phase sample. In this paper, we use a Bayesian tree-based multiple imputation (MI) approach for estimating population means using a two-phase survey design. We demonstrate how to incorporate complex survey design features, such as strata, clusters, and weights, into the imputation procedure. We use a simulation study to evaluate the performance of the tree-based MI approach in comparison to the alternative weighted analyses using the subsample weights. We find the tree-based MI method outperforms weighting methods with smaller bias, reduced root mean squared error, and narrower 95\% confidence intervals that have closer to the nominal level coverage rate. We illustrate the application of the proposed method by estimating the prevalence of diabetes among the United States non-institutionalized adult population using the fasting blood glucose data collected only on a subsample of participants in the 2017-2018 National Health and Nutrition Examination Survey.

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Paper to Video (Beta)

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.