Papers
Topics
Authors
Recent
Search
2000 character limit reached

Semiparametric Efficient Data Integration Using the Dual-Frame Sampling Framework

Published 13 Jan 2026 in stat.ME and math.ST | (2601.08707v1)

Abstract: Integrating probability and non-probability samples is increasingly important, yet unknown sampling mechanisms in non-probability sources complicate identification and efficient estimation. We develop semiparametric theory for dual-frame data integration and propose two complementary estimators. The first models the non-probability inclusion probability parametrically and attains the semiparametric efficiency bound. We introduce an identifiability condition based on strong monotonicity that identifies sampling-model parameters without instrumental variables, even under informative (non-ignorable) selection, using auxiliary information from the probability sample; it remains valid without record linkage between samples. The second estimator, motivated by a two-stage sampling approximation, avoids explicit modeling of the non-probability mechanism; though not fully efficient, it is efficient within a restricted augmentation class and is robust to misspecification. Simulations and an application to the Culture and Community in a Time of Crisis public simulation dataset show efficiency gains under correct specification and stable performance under misspecification and weak identification. Methods are implemented in the R package \texttt{dfSEDI}.

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (2)

Collections

Sign up for free to add this paper to one or more collections.