Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
166 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Privacy Preserving Data Imputation via Multi-party Computation for Medical Applications (2405.18878v1)

Published 29 May 2024 in cs.CR and cs.LG

Abstract: Handling missing data is crucial in machine learning, but many datasets contain gaps due to errors or non-response. Unlike traditional methods such as listwise deletion, which are simple but inadequate, the literature offers more sophisticated and effective methods, thereby improving sample size and accuracy. However, these methods require accessing the whole dataset, which contradicts the privacy regulations when the data is distributed among multiple sources. Especially in the medical and healthcare domain, such access reveals sensitive information about patients. This study addresses privacy-preserving imputation methods for sensitive data using secure multi-party computation, enabling secure computations without revealing any party's sensitive information. In this study, we realized the mean, median, regression, and kNN imputation methods in a privacy-preserving way. We specifically target the medical and healthcare domains considering the significance of protection of the patient data, showcasing our methods on a diabetes dataset. Experiments on the diabetes dataset validated the correctness of our privacy-preserving imputation methods, yielding the largest error around $3 \times 10{-3}$, closely matching plaintext methods. We also analyzed the scalability of our methods to varying numbers of samples, showing their applicability to real-world healthcare problems. Our analysis demonstrated that all our methods scale linearly with the number of samples. Except for kNN, the runtime of all our methods indicates that they can be utilized for large datasets.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (20)
  1. Missing data in medical databases: Impute, delete or classify? Artificial intelligence in medicine, 58(1):63–72, 2013.
  2. Paul D Allison. Missing data. The SAGE handbook of quantitative methods in psychology, pages 72–89, 2009.
  3. Teresa A. Myers. Goodbye, listwise deletion: Presenting hot deck imputation as an easy and effective tool for handling missing data. Communication Methods and Measures, 5(4):297–310, 2011.
  4. Privacy-preserving of svm over vertically partitioned with imputing missing data. Distributed and Parallel Databases, 35(3-4):363–382, 2017.
  5. Privacy-preserving genotype imputation with fully homomorphic encryption. Cell systems, 13(2):173–182.e3, 2022.
  6. Privacy-preserving vertical federated knn feature imputation method. Electronics, 13(2):381, 2024.
  7. Imputation under differential privacy. arXiv preprint arXiv:2206.15063, 2022.
  8. Privacy-preserving imputation of missing data. Data & Knowledge Engineering, 65(1):40–56, 2008.
  9. Imputing for missing survey responses. In Proceedings of the section on survey research methods, American Statistical Association, volume 22, page 31. American Statistical Association Cincinnati, 1982.
  10. Review: a gentle introduction to imputation of missing values. Journal of clinical epidemiology, 59(10):1087–1091, 2006.
  11. Zhongheng Zhang. Missing data imputation: focusing on single imputation. Annals of translational medicine, 4(1):9, 2016.
  12. Comparison of performance of data imputation methods for numeric dataset. Applied Artificial Intelligence, 33(10):913–933, 2019.
  13. Multiple regression. In Ashish Sen and Muni Srivastava, editors, Regression Analysis, Springer Texts in Statistics, pages 28–59. Springer Berlin Heidelberg, Berlin, Heidelberg, 1990.
  14. Gene H. Golub and Charles F. van Loan. Matrix computations, volume 3 of Johns Hopkins series in the mathematical sciences. Johns Hopkins Univ. Pr, Baltimore, Md., 6. print edition, 1988.
  15. Privacy preserving distributed data mining based on secure multi-party computation. Computer Communications, 153:208–216, 2020.
  16. Nearest neighbor imputation algorithms: a critical evaluation. BMC medical informatics and decision making, 16 Suppl 3(Suppl 3):74, 2016.
  17. Evaluating missing value imputation methods for food composition databases. Food and chemical toxicology : an international journal published for the British Industrial Biological Research Association, 141:111368, 2020.
  18. Missing value imputation in high-dimensional phenomic data: imputable or not, and how? BMC bioinformatics, 15(1):346, 2014.
  19. Imputation with the r package vim. Journal of Statistical Software, 74(7), 2016.
  20. Cecilia: Comprehensive secure machine learning framework. arXiv e-prints, pages arXiv–2202, 2022.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com