Semisupervised score based matching algorithm to evaluate the effect of public health interventions (2403.12367v1)
Abstract: Multivariate matching algorithms "pair" similar study units in an observational study to remove potential bias and confounding effects caused by the absence of randomizations. In one-to-one multivariate matching algorithms, a large number of "pairs" to be matched could mean both the information from a large sample and a large number of tasks, and therefore, to best match the pairs, such a matching algorithm with efficiency and comparatively limited auxiliary matching knowledge provided through a "training" set of paired units by domain experts, is practically intriguing. We proposed a novel one-to-one matching algorithm based on a quadratic score function $S_{\beta}(x_i,x_j)= \betaT (x_i-x_j)(x_i-x_j)T \beta$. The weights $\beta$, which can be interpreted as a variable importance measure, are designed to minimize the score difference between paired training units while maximizing the score difference between unpaired training units. Further, in the typical but intricate case where the training set is much smaller than the unpaired set, we propose a \underline{s}emisupervised \underline{c}ompanion \underline{o}ne-\underline{t}o-\underline{o}ne \underline{m}atching \underline{a}lgorithm (SCOTOMA) that makes the best use of the unpaired units. The proposed weight estimator is proved to be consistent when the truth matching criterion is indeed the quadratic score function. When the model assumptions are violated, we demonstrate that the proposed algorithm still outperforms some popular competing matching algorithms through a series of simulations. We applied the proposed algorithm to a real-world study to investigate the effect of in-person schooling on community Covid-19 transmission rate for policy making purpose.
- {barticle}[author] \bauthor\bsnmAustin, \bfnmPeter C\binitsP. C. (\byear2014). \btitleA comparison of 12 algorithms for matching on the propensity score. \bjournalStatistics in medicine \bvolume33 \bpages1057–1069. \endbibitem
- {binproceedings}[author] \bauthor\bsnmBilenko, \bfnmMikhail\binitsM., \bauthor\bsnmBasu, \bfnmSugato\binitsS. and \bauthor\bsnmMooney, \bfnmRaymond J\binitsR. J. (\byear2004). \btitleIntegrating constraints and metric learning in semi-supervised clustering. In \bbooktitleProceedings of the twenty-first international conference on Machine learning \bpages11. \endbibitem
- {bincollection}[author] \bauthor\bsnmCox, \bfnmMichael AA\binitsM. A. and \bauthor\bsnmCox, \bfnmTrevor F\binitsT. F. (\byear2008). \btitleMultidimensional scaling. In \bbooktitleHandbook of data visualization \bpages315–347. \bpublisherSpringer. \endbibitem
- {barticle}[author] \bauthor\bsnmFisher, \bfnmRonald A\binitsR. A. (\byear1936). \btitleThe use of multiple measurements in taxonomic problems. \bjournalAnnals of eugenics \bvolume7 \bpages179–188. \endbibitem
- {bbook}[author] \bauthor\bsnmGiraud, \bfnmChristophe\binitsC. (\byear2021). \btitleIntroduction to high-dimensional statistics. \bpublisherCRC Press, (pg. 3-5). \endbibitem
- {barticle}[author] \bauthor\bsnmGloberson, \bfnmAmir\binitsA. and \bauthor\bsnmRoweis, \bfnmSam\binitsS. (\byear2005). \btitleMetric learning by collapsing classes. \bjournalAdvances in neural information processing systems \bvolume18. \endbibitem
- {barticle}[author] \bauthor\bsnmHastie, \bfnmTrevor\binitsT., \bauthor\bsnmBuja, \bfnmAndreas\binitsA. and \bauthor\bsnmTibshirani, \bfnmRobert\binitsR. (\byear1995). \btitlePenalized discriminant analysis. \bjournalThe Annals of Statistics \bvolume23 \bpages73–102. \endbibitem
- {barticle}[author] \bauthor\bsnmHastie, \bfnmTrevor\binitsT., \bauthor\bsnmTibshirani, \bfnmRobert\binitsR. and \bauthor\bsnmBuja, \bfnmAndreas\binitsA. (\byear1994). \btitleFlexible discriminant analysis by optimal scoring. \bjournalJournal of the American statistical association \bvolume89 \bpages1255–1270. \endbibitem
- {barticle}[author] \bauthor\bsnmHastie, \bfnmTrevor\binitsT. and \bauthor\bsnmTibshirani, \bfnmRobert\binitsR. (\byear1995). \btitleDiscriminant adaptive nearest neighbor classification and regression. \bjournalAdvances in neural information processing systems \bvolume8. \endbibitem
- {barticle}[author] \bauthor\bsnmHastie, \bfnmTrevor\binitsT. and \bauthor\bsnmTibshirani, \bfnmRobert\binitsR. (\byear1996). \btitleDiscriminant analysis by Gaussian mixtures. \bjournalJournal of the Royal Statistical Society Series B: Statistical Methodology \bvolume58 \bpages155–176. \endbibitem
- {barticle}[author] \bauthor\bsnmPeng, \bfnmYuxin\binitsY., \bauthor\bsnmNgo, \bfnmChong-Wah\binitsC.-W. and \bauthor\bsnmXiao, \bfnmJianguo\binitsJ. (\byear2007). \btitleOM-based video shot retrieval by one-to-one matching. \bjournalMultimedia Tools and Applications \bvolume34 \bpages249–266. \endbibitem
- {bbook}[author] \bauthor\bsnmRockafellar, \bfnmRalph Tyrell\binitsR. T. (\byear2015). \btitleConvex analysis. \bpublisherPrinceton university press. \endbibitem
- {barticle}[author] \bauthor\bsnmRosenbaum, \bfnmPaul R\binitsP. R. and \bauthor\bsnmRubin, \bfnmDonald B\binitsD. B. (\byear1983). \btitleThe central role of the propensity score in observational studies for causal effects. \bjournalBiometrika \bvolume70 \bpages41–55. \endbibitem
- {barticle}[author] \bauthor\bsnmRosenbaum, \bfnmPaul R\binitsP. R. and \bauthor\bsnmRubin, \bfnmDonald B\binitsD. B. (\byear1985). \btitleConstructing a control group using multivariate matched sampling methods that incorporate the propensity score. \bjournalThe American Statistician \bvolume39 \bpages33–38. \endbibitem
- {bmisc}[author] \bauthor\bsnmSchmee, \bfnmJosef\binitsJ. (\byear1986). \btitleAn introduction to multivariate statistical analysis. \endbibitem
- {barticle}[author] \bauthor\bsnmStuart, \bfnmElizabeth A\binitsE. A. (\byear2010). \btitleMatching methods for causal inference: A review and a look forward. \bjournalStatistical science: a review journal of the Institute of Mathematical Statistics \bvolume25 \bpages1. \endbibitem
- {barticle}[author] \bauthor\bsnmWeinberger, \bfnmKilian Q\binitsK. Q. and \bauthor\bsnmSaul, \bfnmLawrence K\binitsL. K. (\byear2009). \btitleDistance metric learning for large margin nearest neighbor classification. \bjournalJournal of machine learning research \bvolume10. \endbibitem
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.