Papers
Topics
Authors
Recent
2000 character limit reached

Mixed Matrix Completion in Complex Survey Sampling under Heterogeneous Missingness (2402.03954v1)

Published 6 Feb 2024 in stat.ME and stat.ML

Abstract: Modern surveys with large sample sizes and growing mixed-type questionnaires require robust and scalable analysis methods. In this work, we consider recovering a mixed dataframe matrix, obtained by complex survey sampling, with entries following different canonical exponential distributions and subject to heterogeneous missingness. To tackle this challenging task, we propose a two-stage procedure: in the first stage, we model the entry-wise missing mechanism by logistic regression, and in the second stage, we complete the target parameter matrix by maximizing a weighted log-likelihood with a low-rank constraint. We propose a fast and scalable estimation algorithm that achieves sublinear convergence, and the upper bound for the estimation error of the proposed method is rigorously derived. Experimental results support our theoretical claims, and the proposed estimator shows its merits compared to other existing methods. The proposed method is applied to analyze the National Health and Nutrition Examination Survey data.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (41)
  1. Collective matrix completion. Journal of Machine Learning Research 20(148), 1–43.
  2. Fundamental sampling patterns for low-rank multi-view data completion. Pattern Recognition 103, 107307.
  3. Beck, A. (2017). First-Order Methods in Optimization. Philadelphia: SIAM.
  4. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences 2(1), 183–202.
  5. A singular value thresholding algorithm for matrix completion. SIAM Journal on Optimization 20(4), 1956–1982.
  6. Exact matrix completion via convex optimization. Foundations of Computational Mathematics 9(6), 717–772.
  7. Nearest neighbor imputation for survey data. Journal of Official Statistics 16(2), 113–132.
  8. Multi-label nonlinear matrix completion with transductive multi-task feature selection for joint mgmt and idh1 status prediction of patient with high-grade gliomas. IEEE transactions on medical imaging 37(8), 1775–1787.
  9. National health and nutrition examination survey, 2015-2018: sample design and estimation procedures. Vital Health Stat 2(184).
  10. Using side information to reliably learn low-rank matrices from missing and corrupted observations. Journal of Machine Learning Research 19(76), 3005–3039.
  11. Inference for nonprobability samples. Statistical Science 32(2), 249–264.
  12. Generalized high-dimensional trace regression via nuclear norm regularization. Journal of Econometrics 212(1), 177–202.
  13. Max-norm optimization for robust matrix recovery. Mathematical Programming 167, 5–35.
  14. Flexible low-rank statistical modeling with missing data and side information. Statistical Science 33(2), 238–260.
  15. Fuller, W. A. (2009). Sampling Statistics. Wiley, Hoboken, NJ.
  16. Transduction with matrix completion: Three birds with one stone. Advances in neural information processing systems 23, 757–765.
  17. Haziza, D. and J.-F. Beaumont (2017). Construction of weights in surveys: A review. Statistical Science 32(2), 206–226.
  18. Horvitz, D. G. and D. J. Thompson (1952). A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association 47(260), 663–685.
  19. Jain, P. and I. S. Dhillon (2013). Provable inductive matrix completion. arXiv preprint arXiv:1306.0626, 1–22.
  20. An accelerated gradient method for trace norm minimization. In Proceedings of the 26th Annual International Conference on Machine Learning, pp.  457–464.
  21. Matrix completion from noisy entries. Journal of Machine Learning Research 11, 2057–2078.
  22. Kiers, H. A. (1991). Simple structure in component analysis techniques for mixtures of qualitative and quantitative variables. Psychometrika 56(2), 197–212.
  23. Fractional hot deck imputation. Biometrika 91(3), 559–578.
  24. Statistical Methods for Handling Incomplete Data. New York: CRC Press.
  25. Nuclear-norm penalization and optimal rates for noisy low-rank matrix completion. The Annals of Statistics 39(5), 2302–2329.
  26. Little, R. J. and D. B. Rubin (2019). Statistical Analysis with Missing Data (3rd ed.). Hoboken: Wiley.
  27. Matrix completion with covariate information. Journal of the American Statistical Association 114(525), 198–210.
  28. Matrix completion under complex survey sampling. Annals of the Institute of Statistical Mathematics 75(3), 463–492.
  29. Matrix completion under low-rank missing mechanism. Statistica Sinica 31, 2005–2030.
  30. Spectral regularization algorithms for learning large incomplete matrices. Journal of Machine Learning Research 11, 2287–2322.
  31. Negahban, S. and M. J. Wainwright (2012). Restricted strong convexity and weighted matrix completion: Optimal bounds with noise. Journal of Machine Learning Research 13(1), 1665–1697.
  32. Pagès, J. (2014). Multiple Factor Analysis by Example Using R. New York: CRC Press.
  33. Pfeffermann, D. (1996). The use of sampling weights for survey data analysis. Statistical Methods in Medical Research 5(3), 239–261.
  34. Rivers, D. (2007). Sampling for web surveys. In Joint Statistical Meetings, Volume 4.
  35. Main effects and interactions in mixed and incomplete data frames. Journal of the American Statistical Association 115(531), 1292–1303.
  36. Rubin, D. B. (1976). Inference and missing data. Biometrika 63(3), 581–592.
  37. Generalized low rank models. Foundations and Trends® in Machine Learning 9(1), 1–118.
  38. Transductive matrix completion with calibration for multi-task learning. In ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.  1–5.
  39. Speedup matrix completion with side information: Application to multi-label learning. Advances in Neural Information Processing Systems 26, 2301–2309.
  40. Inductive multi-task learning with multiple view data. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pp.  543–551.
  41. A nonconvex optimization framework for low rank matrix estimation. Advances in Neural Information Processing Systems 28, 559–567.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.